MySQL, Structured Query Language.

SQL, (Structured Query Language; other database links, SEQUEL.) A relational database management system RDMS Also links to other Database related websites. MySQL, (pronounced "my ess cue el," not "my sequel"),  is a Relational Database Management System, (RDMS), which means it stores data in separate tables rather than putting all the data in one big area. This adds flexibility, as well as speed. The SQL part of MySQL stands for "Structured Query Language," which is the most common language used to access databases. The MySQL database server is the most popular open source database in the world. It is extremely fast and easy to customize, due to its architecture. Extensive reuse of code within the software, along with a minimalist approach to producing features with lots of functionality, gives MySQL a claimed unmatched speed, compactness, stability, and ease of deployment. Their unique separation of the core server from the storage engine makes it possible to run with very strict control, or with ultra fast disk access, whichever is more appropriate for the situation.

Trouble Shooting SQL        SQL Injection

Compare Bargains on SQL

Click Here For Your Own Business Website. With hundreds Of Free Webmaster Resources. Webmaster resources like no other site on the Internet. Provides webmasters with the tools they need to create fun and valuable business websites from scratch, within minutes. Combined professional quality designs, powerful PHP and MySQL scripts to create the largest and most exclusive turnkey collection for web designers, entrepreneurs and beginners or enthusiast.

MySQL The MySQL homepage
MySQL Developer Zone
MySQL Downloads
MySQL Administrator
MySQL Query Browser Downloads
MySQL Reference Manual (dev.mysql)
MySQL Reference Manual (uniar.ukrnet)
Installand use MySQL
MySQL Basics A MySQL Tutorial
Intrduction to MySQL SQL Tutorial
Perl Masters Basics of MySQL
TechRepublic Improving your SQL skills
PC Voyager (Various Databases)
Linux-mag
eXtropia Tutorals
SQL Tutorial
SQL Pocket Guide O'Reilly
SQL.org
SQL Server Worlwide Users Group
MySQL (doc.ddat) Reference Manual
MySQL GUI Tools Downloads
MySQL Documentation
Plus2net
Web Monkey PHP/MySQL Tutorial Overview
Database Journal MySQL, MS SQL, Access and more...
Introduction By James Hoffman
SQL Solution® (PHP) and Form Solution® (PHP)
CodeBase Database Development Tools for Programmers
Internet Related Technologies
SQL Reference Page
My SQL Users
SQL Team
Database Journal
SQL course
SQL Tools Summary
SQL Junkies
MS SQL City
MySQL Reference Manual for version 5.0.0-alpha
SQL Server Central
SQL Junkies
MySQL Tutorial
SQL Magazine

SQL Junkies A feature-packed SQL Server Web site communities on the Internet today. Community for developers to come and learn about building solutions using Microsoft SQL Server while being part of a collaborative community of peers.

MySQL Query Analyzer is a free, powerful and simple to use tool for creating QL scripts for MySQL database engine.

SQL Protocols Discussions related to Microsoft's SQL Server Protocols - Netlibs, TDS and (new for SQL 2005) SOAP.

Advanced MySQL Database Administration

mysqld, also known as MySQL Server, is the main program that does most of the work in a MySQL installation. MySQL Server manages access to the MySQL data directory that contains databases and tables. The data directory is also the default location for other information such as log files and status files

phpMyAdmin (Web Interface for SQL).  A free software tool written in PHP intended to handle the administration of MySQL over the World Wide Web. phpMyAdmin supports a wide range of operations with MySQL. The most frequently used operations are supported by the user interface (managing databases, tables, fields, relations, indexes, users, permissions, etc), while you still have the ability to directly execute any SQL statement.

Learning SQL Using phpMyAdmin

SQL Zoo Interactive SQL tutorial, learn about: SQL Server, Oracle, MySQL, DB2, Mimer, PostgreSQL, SQLite and Access.

BigDump: Staggered MySQL Dump Importer. Staggered import of large and very large MySQL Dumps (like phpMyAdmin Dumps) even through the web-servers with hard runtime limit and those in safe mode. The script executes only a small part of the huge dump and restarts itself. The next session starts where the last was stopped. This is great for getting those huge SQL files uploaded and installed ready for use with your SQL quires.

MySQLDumper is a backup script for MySQL-Databases, written in PHP and Perl.  MySQLDumper uses a proprietary technique to avoid execution interruption. It only reads and saves a certain amount of commands and then calls itself via JavaScript and memorizes how far in the process it was and resumes its action from its last standby. MySQLDumper restores a backup file by using the same process. Unlike other tools splitting and splicing of large files is no longer necessary. MySQLDumper offers to write data directly into a compressed gz-File. The Restore-Script is able to read this file directly without unpacking it. Of course you can use it without compression, however using Gzip saves a sizeable amount of bandwidth.

mysqldump A Database Backup Program. The mysqldump client is a backup program originally written by Igor Romanenko. It can be used to dump a database or a collection of databases for backup or transfer to another SQL server (not necessarily a MySQL server). The dump typically contains SQL statements to create the table, populate it, or both. However, mysqldump can also be used to generate files in CSV, other delimited text, or XML format. The Database Publishing Wizard enables the deployment of SQL Server 2005 databases (both schema and data) into a shared hosting environment on either a SQL Server 2000 or 2005 server. The tool supports two modes of deployment: It generates a single SQL script file which can be used to recreate a database when the only connectivity to a server is through a web-based control panel with a script execution window. It connects to a web service provided by your hoster and directly creates objects on a specified hosted database. The Database Publishing Wizard provide both a graphical and a command-line interface. In addition, it can integrate directly into Visual Studio 2005 or Visual Web Developer 2005

How do I upload large SQL files to MySQL? The solution to my problem is using the MySQL Tools, (GUI, Graphical User Interface), provided by MySQL. Best of all they are all FREE.

mylvmbackup is a tool for quickly creating backups of a MySQL server's data files. To perform a backup, mylvmbackup obtains a read lock on all tables and flushes all server caches to disk, creates a snapshot of the volume containing the MySQL data directory, and unlocks the tables again. The snapshot process takes only a small amount of time. When it is done, the server can continue normal operations, while the actual file backup proceeds.  See Lenz Grimmer's blog Random notes about Linux, MySQL and Open Source   Also read LanchPad mylvmbackup

Maatkit makes MySQL easier and safer to manage. It provides simple, predictable ways to do things you cannot otherwise do. It would be nice if these features were included with MySQL, but they are not. That's why Maatkit is now shipping by default with many GNU/Linux distributions such as Debian and CentOS.  You can use Maatkit to prove replication is working correctly, fix corrupted data, automate repetitive tasks, speed up your servers, and much, much more.

MySQL Backup enables you to backup a consistent image of a MySQL Server's data and associated metadata via a direct connection to the MySQL server. The backup is synchronized between different storage engines and with the binary log (that can be used for point in time recovery). Different techniques are used by different storage engines to provide the best possible backup and restore. The backup image is stored as a file by the MySQL server. Note: MySQL Backup is currently being developed and this page describes the work in progress. Online Backup of MySQL Cluster

MySQL Workbench is a visual database design tool that is developed by MySQL. It is the successor application of the DBDesigner4 project. It is able to display EER Diagrams, (Entity-Relationship Diagrams), that visualize different parts of the catalogue.

Tip/Trick: How to upload a .SQL file to a Hoster and Execute it to Deploy a SQL Database.

Planet MySQL :-

Planet MySQL

Planet MySQL - http://www.planetmysql.org/

Embedded InnoDB: InnoDB Status
Tue, 16 Mar 2010 05:59:19 +0000 - Using the Embedded InnoDB plugin I’m working on, you can use the INNODB_STATUS table function in the data_dictionary, you can do pretty neat things. For example, we can see that each autocommit transaction causes an fsync and if you insert multiple rows ina  single statement, you still only get 1 fsync: drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS -> WHERE name="fsync_req_done"; +----------------+-------+ | NAME | VALUE | +----------------+-------+ | fsync_req_done | 25 | +----------------+-------+ 1 row in set (0 sec) drizzle> insert into t1 values (1); Query OK, 1 row affected (0.05 sec) drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS WHERE name="fsync_req_done"; +----------------+-------+ | NAME | VALUE | +----------------+-------+ | fsync_req_done | 26 | +----------------+-------+ 1 row in set (0 sec) drizzle> insert into t1 values (1),(2),(3),(4);Query OK, 4 rows affected (0 sec) Records: 4 Duplicates: 0 Warnings: 0 drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS WHERE name="fsync_req_done"; +----------------+-------+ | NAME | VALUE | +----------------+-------+ | fsync_req_done | 27 | +----------------+-------+ 1 row in set (0 sec)
Embedded InnoDB: querying the configuration
Tue, 16 Mar 2010 05:35:03 +0000 - I am rather excited about being able to do awesome things such as this to get the current configuration of your server: drizzle> SELECT NAME,VALUE -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION -> WHERE NAME IN ("data_file_path", "data_home_dir"); +----------------+-------+ | NAME | VALUE | +----------------+-------+ | data_file_path | NULL | | data_home_dir | ./ | +----------------+-------+ 2 rows in set (0 sec) drizzle> SELECT NAME,VALUE -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION -> WHERE NAME IN ("data_file_path", "data_home_dir"); +----------------+-------+ | NAME | VALUE | +----------------+-------+ | data_file_path | NULL | | data_home_dir | ./ | +----------------+-------+ 2 rows in set (0 sec) drizzle> SELECT NAME,VALUE -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION -> WHERE NAME = "io_capacity"; +-------------+-------+ | NAME | VALUE | +-------------+-------+ | io_capacity | 200 | +-------------+-------+ 1 row in set (0 sec) Coming soon: status in a table. (this is for the upcoming embedded_innodb plugin, which using the API provided by Embedded InnoDB to implement a Storage Engine for Drizzle)
Percona sessions at the MySQL conference
Mon, 15 Mar 2010 22:10:19 +0000 - Many Percona employees will be at the 2010 MySQL conference. We’ll be giving a lot of informative technical talks on various topics. Here’s a list: Morgan Tocker, Baron Schwartz: Diagnosing and Fixing MySQL Performance Problems Peter Zaitsev: Scaling Applications with Caching, Sharding and Replication Baron Schwartz: EXPLAIN Demystified Vadim Tkachenko: An Overview of Flash Storage for Databases Matt Yonkovit: The Five Minute DBA Bill Schuler, Baron Schwartz: Performance and Feature Enhancements to MySQL and InnoDB Fernando Ipar: PHP Object-Relational Mapping Libraries In Action Baron Schwartz: Read-Write Splitting: Techniques, Challenges, and Solutions Matt Yonkovit, Yves Trudeau: Choosing the Right Tools for the Job, SQL or NOSQL Vadim Tkachenko, et al: Panel: How Solid-state Technologies are Transforming MySQL Server Performance and the Datacenter Architectures Morgan Tocker: Understanding the Role of IO As a Bottleneck Baron Schwartz: MySQL Graphing and Trending with Cacti Vadim Tkachenko: XtraBackup: Hot Backups and More Aleksandr Kuzminsky: Recovery of Lost or Corrupted InnoDB Tables Ryan Lowe: Achieving PCI Compliance with MySQL Yves Trudeau: How to choose High Availability solutions for MySQL Yasufumi Kinoshita: How to Fulfil the Potential of InnoDB’s Performance and Scalability Peter Zaitsev: Choosing Hardware and Operating Systems for MySQL Peter Zaitsev: Instrumenting your Application for MySQL and Memcached Peter Zaitsev: InnoDB Architecture and Performance Optimization – Part 1 and InnoDB Architecture and Performance Optimization – Part 2 In addition, you’ll be able to visit us at our booth in the expo hall, where you can play Stump The Experts with your tough problems. And Daniel Nichter will be staffing the Maatkit booth, so you can ask him all about Maatkit. If Maatkit is of interest to you, you might also want to attend the Maatkit BoF session. We look forward to seeing you there. It’s going to be a great conference! Entry posted by Baron Schwartz | One comment Add to: | | | |
Denormalization - Examples Needed
Mon, 15 Mar 2010 15:16:00 +0000 - Dear MySQL Community I would like to ask for your help as I am writing a report for use cases for denormalization. Could you please add a comment if you have used denormalization in the past to help solve a certain problem. If you could add what was the problem, how did you apply denormalization and how effective (or not) was it? Thank you in advance.
Dbspj update, teaser for UC
Mon, 15 Mar 2010 14:07:00 +0000 - 5 months more...reminder, what is Dbspj:- It's a new feature for MySQL Cluster- It gives the possibility to push-down SQL joins, i.e to evaluate the joins inside the data nodes.latest and greatest:- last 5 months spent on SQL integration- big effort having it used only for cases that we actually support- lots of testing using RQG- a significant part of bugs found last couple weeks, has not been related to spj, but in fact "ordinary" optimizer bugs- we think that it's quite usable now.and the numbers:- my plan was to present TPC-W as "realistic" numbers- now they look more like fantastic numbers- if i could, i would add a "25x" to my title- visit my UC presentation (link) to learn moreand fyi: we also plan to provide a feature preview source (or binary) release for (i will of course also disclose all information provided in UC presentation after the event)
Percona-XtraDB version 9.1
Mon, 15 Mar 2010 04:13:37 +0000 - Dear Community, We are announcing today the new version 9.1 of XtraDB storage engine. The name of binaries has changed to Percona-XtraDB. It is applicable to all packages including RPM, DEB and tar.gz packages. New features in version 9.1: MySQL 5.1.43 is taken as the basis packages name changed to Percona-XtraDB Enabled support of SSL Enabled profiling Added script to sort LRU dump New supported platforts are added. The full list includes: CentOS 5 (x86_64 and i386) CenOS 4 (x86_64 and i386) Debian lenny (x86_64 and i386) Debian etch (x86_64 and i386) Ubuntu  Jaunty (x86_64 and i386) Ubuntu Intrepid (x86_64 and i386) Ubuntu Hardy (x86_64 and i386) FreeBSD 8 (x86_64 and i386) OpenSolaris (x86_64) Fixed bugs: Bug #506894: buf_flush_LRU_recommendation() is too optimistic Fixed mysql-tests: mysql mysql_upgrade ssl tests enabled rpl_killed_ddl and innodb-autoinc tests Percona-XtraDB obsoletes mysql-server packages, so upgrade is pretty straightforward. Instead of updating currently installed packages Percona-XtraDB should be installed. It will replace currently installed mysql-server, mysql-client, etc. Centos platform. 1. If you didn't install yet Percona YUM repo, please do it. It is available both for x86_64 and i386. 2. Install Percona-XtraDB PLAIN TEXT CODE: # yum install Percona-XtraDB-server Percona-XtraDB-client Debian platform. 1. Add Percona repository to the sources.list PLAIN TEXT CODE: deb http://repo.percona.com/apt lenny main deb-src http://repo.percona.com/apt lenny main Instead of "lenny" put the name of your distribution. 2. Update the repository with "apt-get update" 3. Install Percona-XtraDB package PLAIN TEXT CODE: # apt-get install percona-xtradb-server percona-xtradb-client The binaries for supported platforms are located on http://www.percona.com/percona-builds/Percona-XtraDB/Percona-XtraDB-5.1.43-9.1/ . The latest source code of XtraDB, including development branch you can find on LaunchPAD. Please report any bugs found on Bugs in Percona XtraDB server. For general questions use our Pecona-discussions group, and for development question Percona-dev group. For support, commercial and sponsorship inquiries contact Percona Entry posted by Aleksandr Kuzminsky | 9 comments Add to: | | | |
Response to Community Feedback
Mon, 15 Mar 2010 03:11:16 +0000 - In our first alpha of InfiniDB 1.1, we’ve responded to a number of feedback items from our community, and we’d like to thank those of you who have been working with us on these and other things. Specifically, two recurring comments from the community, with respect to our 1.0 version, were: (1) CREATE TABLE statements for wide tables sometimes take a long time to complete; (2) The amount of space used by empty and/or small tables is excessive.First, a little background as tRead More...
Postgres at MySQL Conference?
Mon, 15 Mar 2010 02:49:00 +0000 - During the MySQL conference Call for Papers there was some talk of getting one or two Postgres sessions into the mix, as a lot of MySQL users seem to have questions about Postgres these days. Alas, looking through the MySQLcon schedule I don't see any on there. I've also looked through the BOF's and nothing about Postgres to be found there either. So, maybe no one is interested in Postgres after all. However I held a Postgres BOF at MySQLcon last year and we got a handful of people, and since I am going to be at MySQLcon again this year, I might as well host one again. I think it's too late to schedule one formally, but I can put some info on the schedule sheets once I'm at the conference; if you are interested in learning some more about Postgres, please keep an eye out.
Thoughts on Thoughts on Drizzle :)
Mon, 15 Mar 2010 01:25:22 +0000 - Mark has some good thoughts on drizzle. I think they’re all valid… and have some extra thoughts too: “I have problems to solve today”. This is (of course) an active concern in my brain… If we don’t have something out that solves some set of problems with reasonable stability and reliability (and soon), then we are failing. I feel we’re getting there, and will have a solid foundation to build upon. Drizzle replication, MySQL replication: “I can’t compare the two until Drizzle replication is running in production.“. Completely agree. We need to only say replication is stable and reliable when it really is. Realistic test suites are needed. Very defensive programming of the replication system is needed (you want to know when something has gone wrong). We also need to have it constantly be verifying the right thing is going on. We want our problems to be user visible, not silent and invisible. Having high standards will hopefully pay off when people start running it in production…. 3 byte int: “Does this mean that some of my tables will grow from 3GB to 4GB on disk?” I think we’re moving the responsibility down to the engines. The 3 byte int type says two things: use less storage space, limit the maximum value. Often you want the former, not the latter. There are many ways to more efficiently pack integers for storage when they are usually smaller than the maximum you want. The protobuf library does a good job of it. I think it is the job of storage engines to do better here. Once you’re in memory, 3 byte numbers are horrible to work with.. copy out of a row buffer, convert into a 32bit number and then do foo. Modern CPUs favor 32 or 64bit alignment of data a *lot*. 3byte numbers do not align to 32 or 64bits very well… making things much slower for the common case of using cached data. “I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions.” The reduction of network round trips is always crucial. I think a lot of round trips could go away if you could issue multiple statements at once (not via semicolon separating them, by protocol awesomeness). There should be a way to send a set of statements that should be executed. There should also be a way to specify that if no error occurred, commit. This could then be (in the common case) a single round trip to the database. You then only have to make round-trips when what statement to issue next depends on the result of a previous one. The next step being to reduce these round trips… which can either be solved by executing something inside the database server (e.g. stored procedures) or something closer to the database server so that the round trips aren’t as large. This would be where Gearman enters. I’m interested to see where these two approaches (issuing in batches and executing closer to the DB server) fall down… I know that latency may not be as good… but throughput should be a lot better. I take heart with “I have yet to use them in MySQL” though. I have my own theories as to why this is… my biggest thought is that it’s because the many, many programmers writing SQL that Mark sees aren’t SQL Stored Procedure programmers. They spend their days in a couple of languages (maybe Perl, Python, PHP, Java, C, C++) and never programmed SQL:2003 Stored Procedures and it just doesn’t come as quickly (or as bug free) as writing code in the languages you use every day. “Long Running insert, update and delete statements consume too many resources in InnoDB.” I wonder if this desire for MyISAM could be filled by PBXT or BlitzDB? The main reason that MyISAM is currently a temporary table only engine is that MyISAM and the server core were never that well separated. My ultimate wish is that all engine authors take the approach of that there is an API to their engine and the Storage Engine is merely glue between the database server and their API. The BlitzDB engine has this, Innobase partially does (and my Embedded InnoDB work goes the whole way) and MySQL Cluster is likely the oldest example. As a side note, the BlitzDB plugin should go into the main Drizzle tree fairly soon. One of the joys of having an optional plugin that doesn’t touch the core of the server is that we can do this without much worry at all. “Does Drizzle build on Windows?” Well… no. Funnily enough though, it is increasingly easy to make a Windows port. All the platform specific things are increasingly just plugins. The build system is a sticker… and no, we’re not going to switch to CMake. The C stands for something, and it’s something that even I may not print here… (I had never thought that being able to open up automake generated Makefiles and look at them would be a feature). This next Drizzle milestone release should be exciting though… I look forward to having Drizzle widely deployed and relied upon… I think we’ll do well..
15 months – And it is done
Sun, 14 Mar 2010 22:28:00 +0000 - Finally: Not quite 65 million years in the making (Jurassic Park, hint, hint), but it took about 15 months to get my first book to the printer. A few days ago Udo – my co-author – and I approved the final version of the MySQL Admin Cookbook for publishing. From what I see the book has not been added consistently to the online book stores around the net, but I will most certainly put links on here
Writing another book: Pentaho Kettle Solutions
Sun, 14 Mar 2010 20:35:00 +0000 - Last year, at about this time of the year, I was well involved in the process of writing the book Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL" for Wiley. To date, "Pentaho Solutions" is still the only all-round book on the open source Pentaho Business Intelligence suite. It was an extremely interesting project to participate in, full of new experiences. Although the act of writing was time consuming and at times very trying for me as well as my family, it was completely worth it. I have none but happy memories of the collaboration with my full co-author Jos van Dongen, our technical editors Jens Bleuel, Jeroen Kuiper, Tom Barber and Tomas Morgner, several of the Pentaho Developers, and last but not least, the team at Wiley, in particular Robert Elliot and Sara Shlaer.When the book was finally published, late August 2009, I was very proud - as a matter of fact, I still am :) Both Jos and I have been rewarded with a lot of positive feedback, and so far, book sales are meeting the expectations of the publisher. We've had mostly positive reviews on places like Amazon, and elsewhere on the web. I'd like to use this opportunity to thank everybody that took the time to review the book: Thank you all - it is very rewarding to get this kind of feedback, and I appreciate it enourmously that you all took the time to spread the word. Beer is on me next time we meet :)Announcing "Pentaho Kettle Solutions" In the autumn of 2009, just a month after "Pentaho Solutions" was published, Wiley contacted Jos and me to find out if we were interested in writing a more specialized book on ETL and data integration using Pentaho. I felt honoured, and took the fact that Wiley, an experienced and well-reknowned publisher in the field of data warehousing and business intelligence, voiced interested in another Pentaho book by Jos an me as a token of confidence and encouragement that I value greatly. (For Pentaho Solutions, we heard that Wiley was interested, but we contacted them.) At the same time, I admit I had my share of doubts, having the memories of what it took to write Pentaho Solutions still fresh in my mind.As it happens, Jos and I both attended the 2009 Pentaho Community Meeting, and there we seized the opportunity to talk to Matt Casters, chief Pentaho Data Integration and founding developer of Kettle (a.k.a. Pentaho Data Integration). Both Jos and I didn't expect Matt to be able to free up any time in his ever busy schedule to help us to write the new book. Needless to say, he made us both very happy when he rather liked the idea, and expressed immediate interest in becoming a full co-author! Together, the three of us made a detailed outline and wrote a formal proposal for Wiley. Our proposal was accepted in December 2009, and we have been writing since, focusing on the forthcoming Kettle version, Kettle 4.0 . The tentative title of the book is Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. It is planned to be published in September 2010, and it will have approximately 750 pages.Our working copy of the outline is quite detailed but may still change in the future, which is why I won't publish it here until we finished our first draft of the book. I am 99% confident that the top level of the outline is stable, and I have no reservation in releasing that already: Part I: Getting StartedETL PrimerKettle ConceptsInstallation and ConfigurationSample ETL SolutionPart II: ETL SubsystemsOverview of the 34 Subsystems of ETLData ExtractionCleansing and ConformingHandling Dimension TablesFact TablesLoading OLAP CubesPart III: Management and DeploymentTesting and DebuggingScheduling and MonitoringVersioning and MigrationLineage and AuditingSecuring your EnvironmentDocumentingPart IV: Performance and ScalabilityPerformance TuningParallization and PartitioningDynamic Clustering in the CloudRealtime and Streaming dataPart V: Integrating and Extending KettlePentaho BI IntegrationThird-party Kettle IntegrationExtending KettlePart VI: Advanced TopicsWebservices and Web APIsComplex File HandlingData Vault ManagementWorking with ERP SystemsFeel free to ask me any questions about this new book. If you're interested, stay tuned - I will probably be posting 2 or 3 updates as we go.
O’Gara Cloud Computing Article Off Base
Sun, 14 Mar 2010 19:30:26 +0000 - Maureen O’Gara, self-described as “the most read technology reporter for the past 20 years”, has written an article about Drizzle at Rackspace for one of Sys-con’s online zines called Cloud Computing Journal, of which she is an editor. I tried commenting on Maureen’s article on their website, but the login system is apparently borked, at least for registered users who use OpenID, which it wants to still have a separate user ID and login. Note to sys-con.com: OpenID is designed so that users don’t have to remember yet another login for your website. Besides having little patience for content-sparse websites that simply provide an online haven for dozens of Flash advertisements per web page, the article had some serious problems with it, not the least of which was using large chunks of my Happiness is a Warm Cloud article without citation. Very professional. OK, to start with, let’s take this quote from the article: Drizzle runs the risk of not being as stable as MySQL, because the Drizzle team is taking things out and putting other stuff in. Of course it may be successful in trying to create a product that’s more stable than MySQL. But creating a stable DBMS engine is something that has always taken years and years. This is just about the most naïve explanation for whether a product will or will not be stable that I’ve ever read. If Maureen had bothered to email or call any one of the core Drizzle developers, they’d have been happy to tell her what is and is not stable about Drizzle, and why. Drizzle has not changed the underlying storage engines, so the InnoDB storage engine in Drizzle is the same plugin as available in MySQL (version 1.0.6). The pieces of MySQL which were removed from Drizzle happen to be the parts of MySQL which have had the most stability issues — namely the additional features added to MySQL 5.0: stored procedures, views, triggers, stored functions, the INFORMATION_SCHEMA implementation, and server-side cursors and prepared statements. In addition to these removed features of MySQL, Drizzle also has no built-in Query Cache, does not support anything other than UTF-8 character sets, and has removed the MySQL replication system and binary logging — moving a rewrite of these pieces out into the plugin ecosystem. The pieces that were added to Drizzle have mostly been done by adding plugins that provide functionality. Maureen, the reason this was done was precisely to allow for greater stability of the kernel by segregating new features and functionality into the plugin ecosystem, where they can be properly versioned and quarantined, therefore increasing kernel stability. It’s pretty much the biggest principle of Drizzle’s design… The core developers of Drizzle (and much of the Drizzle community) would also have been happy to tell Maureen how the Drizzle team defines “stability”: when the community says Drizzle is stable — simple as that. OK, so the next thing I took objection to is the following line: Half of Rackspace’s customers are on MySQL so there’ll be some donkey-style nosing to get them to migrate. I think my Rackspace colleagues might have quite a bit to say about the above. I haven’t seen any Rackers talking about mass migration from MySQL to Drizzle. As far as I have seen, the plan is to provide Drizzle as an additional service to Rackspace customers. Rackspace evidently wants its new boys, who were not the core pillars of the MySQL engineering team, to hitch MySQL, er, Drizzle to Cassandra MySQL != Drizzle. Implying that the two are equal do a disservice to both, as they have very different target markets and developer audiences. The smart money is betting that even if a good number of high-volume web sites go down this route, an even higher number such as Facebook and Google will continue with relational databases, primarily MySQL. Again, probably best to do your homework on this one, too. Facebook runs an amalgamation of a custom MySQL version and storage engines, distributed key-value stores, and Memcached servers. I would think that Facebook moving to Drizzle would be one tough migration. Thousands (tens of thousands?) of MySQL servers all running custom software and integrated into their caching layers is a huge barrier to entry, and not one I would expect a large site like Facebook to casually undertake. But, the same could be said about a move to SQL Server or Oracle, for that matter, and has little to do with Drizzle. Google is moving away from using MySQL entirely. Mark Callaghan, previously at Google, has moved over to Facebook (possibly because of this trend at Google to get rid of MySQL), and Anthony Curtis, formerly of MySQL, then Google, left Google partially because of this reason. OK, so the next quote got me really fired up because it demonstrates a complete lack of understanding (maybe not Maureen’s, but the unnamed source it’s from at least): Somebody – sorry we forget who exactly – claimed that as GPL 2 code Drizzle “severely limits revenue opportunities. For Rackspace, the opportunity to have some key Drizzle developers on its payrolls basically comes down to a promotional benefit, trying to position Rackspace as particularly Drizzle-savvy in the eyes of the community and currying favor for its seemingly generous contributions. What’s unclear is whether they may develop some Drizzle-related functionality that they will then not release as open source and just rent out to Rackspace hosting customers…that would be a way for them to differentiate themselves from competitors and GPLv2 would in principle allow this.” A few points to make about the above quote. First, name your source. I find it difficult to believe that the most-read technology writer would not write down a source. Is it the same person you deliberately left out of a quote from my Happiness article? (why did you do that, btw?). Second, the MySQL server source code is licensed under the GPL 2, and so is Drizzle’s kernel, because it is a derivative work of the MySQL server. Let me be clear: Developers who contribute code to Drizzle do so under the GPLv2 if that contribution is in the Drizzle kernel. If the code contribution is a plugin, the contributor is free to pick whatever license they choose. Third, licensing has little if anything to do with revenue at all. The license is besides the point. There are two things which dictate the company’s revenue derivation from software: Copyright ownership Principles of the Company Drizzle, Rackspace, or any company a Drizzle contributor works for, does not have the copyright ownership of the MySQL source code, from which Drizzle’s kernel is derived. Oracle does. Therefore, companies do not have any right to re-sell Drizzle (under any license) without explicit permission from Oracle. Period. Has nothing to do with the GPLv2. That said, contributors do have the right to make money on plugins built for the Drizzle server, and Rackspace, while not having expressed any interest to yours truly in doing so, has the right like any other Drizzle contributor, to make money on plugins its contributors create for Drizzle. It is my knowledge (after actually having talked to Rackspace managers and decision makers), that Rackspace is not interested in getting into the business of selling commercial Drizzle plugins. Their core direction is to create value for their customers, and I fail to see how getting into the commercial software sales business meets that goal. Next time, please feel free to contact myself or any other Drizzle contributor to get the low-down on Drizzle-related stuff. We’ll be nice. I promise.
[MySQL][Spider][VP]Spider-2.16 Vartical Partitioning-0.9 released
Sun, 14 Mar 2010 19:04:00 +0000 - I'm pleased to announce the release of Spider storage engine version 2.16(beta) and Vertical Partitioning storage engine version 0.9(beta).Spider is a Storage Engine for database sharding.http://spiderformysql.com/Vertical Partitioning is a Storage Engine for vertical partitioning for a table.http://launchpad.net/vpformysqlThe main changes in this version are following.(This release for Vertical Partitioning is a bug fix release)- Add table parameter "semi_split_read".- Add server parameter "spider_semi_split_read".  This parameters are for searching performance improvement.Please see "99_change_logs.txt" in the download documents for checking other changes.Enjoy!
Thoughts on Drizzle
Sun, 14 Mar 2010 18:16:00 +0000 - I wish the case for Drizzle could be made without bashing MySQL. Sometimes it is, but too often it isn't. I guess this is karma. This isn't a rant against Drizzle. This is a rant against pulling up Drizzle by pushing down MySQL. I occasionally have negative things to say about MySQL, but I usually say them to get the problems fixed. We have lots of complaints about MySQL because we use it in production.What have I learned about the Drizzle vision?Drizzle will re-think everything. Alas, I have problems to solve today. While I am passionate about doing things correctly, I am also aware that compromises must be made to get things done. Some of those compromises turn out to be mistakes. It isn't always possible to know which compromises will turn out to be a mistake. Nor is it always possible to identify the right thing. You hate MySQL replication? You now love Drizzle. I love what Drizzle might do for replication. I love what MySQL is doing with replication. I can't compare the two until Drizzle replication is running in production.MySQL has a data type called a 3-byte integer. Think about that for a moment. On today’s server hardware that does not make a whole lot of sense. I thought about it. Does this mean that some of my tables will grow from 3G to 4G on disk? I won't be happy if that is the result. No triggers or stored procedures. That stuff is bloat as done in MySQL, and Drizzle has other ways to deal with these needs. These capabilities can be added in later as needed such that they are done right. I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions. Alas, I have yet to use them in MySQL. MyISAM is gone. Long live the Queen! Alas, I need MyISAM. Long-running insert, update and delete statements consume too many resources in InnoDB. Such statements are used for reporting jobs on slaves and in that case I want to use InnoDB for production tables and MyISAM for transient tables. Ever tried to compile MySQL from source. Hah! Yeah, drizzle builds like butter. I have no problems building MySQL from source. I have had more problems building Drizzle because it has a few more dependencies (google protobufs, libdrizzle). But both are easy to build and nobody cares too much in either case with one exception. Does Drizzle build on Windows?
Drizzle on the Rackspace Cloud Blog
Sun, 14 Mar 2010 06:30:52 +0000 - Adrian has talked about a few of us Drizzle Hackers joining Rackspace over at the Rackspace Cloud Blog.
Drizzle Developer Day 2010
Sun, 14 Mar 2010 05:46:16 +0000 - Hi one and all! Interested in database systems? Interested because you use them? Because you manage them? Write SQL that goes to them? Or are you one of the people of questionable sanity like myself who develops them? Well… do we have the offer for you. Friday, April 16th. Right after the MySQL Conference and Expo at the Santa Clara Convention Center, you can come along to the Drizzle Developer Day. You will want to add your name to this wiki page: http://drizzle.org/wiki/Drizzle_Developer_Day_2010_signup Suggest topics over at: http://drizzle.org/wiki/Drizzle_Developer_Day_2010 Hope to see you there!
Revisiting and defending my Tweets from NoSQL Live in Boston
Sat, 13 Mar 2010 20:38:00 +0000 - Some people have been confused by my Twitter stream from last week's NoSQL Live in Boston conference.  I've never been very good at staying "on message", or adhering to a set of "talking points".  Some people thought I was there to "defend the status quo", or to defend and promote memcached, or memcached+mysql.True, I was there in part to teach about and promote memcached, and especially Gear6 Memcached.  I have a great employer, but they are not sending me to conferences because they are a charity.  I taught the memcached  breakout session, and also worked the "hallway track", giving a number of people a crash course in what memcache is and what makes it useful, and handed out a pile of business cards.As for my tweets and pithy statements... Well, some over-simplification has to happen to reduce a concept to 140 charactersMy statement, "NoSQL as cheaper to deploy, manage, and maintain is a myth. it costs just as much, if not more", which I said that into the mike at the start of the scaling panel, was very popular to tweet and retweet.It's something that is a "Jedi Truth", it's true from a "certain point of view".  When you add together the costs of the much shallower "bench" of hirable operational experience, the increased "minimum useful hardware footprint"  (which seems to be about at least 5 full storage nodes), and the evolving maturity of the client libraries, and such, NoSQL is not going to save you money.  Until an important threshold is reached.  When you scale and/or your data representation "impedance mismatch" hits that nasty inflection point, where the "buck for bang" curve suddenly starts to rise hard, and it looks like you will need to start spending infinite amounts of money to keep growing.  Then the NoSQL approach does become cheaper, because it's actually doable.  And it's probably wise to start considering, researching, and then migrating to NoSQL before you hit that wall.My statement "people have been wanting DBA-less databases about as long as they have wanted programmer-less programming languages" was also popular.  I stand by it.  NoSQL doesn't crack this nut, nothing ever well.  Some NoSQL solutions look like they are "DBA-less", such as AWS SDB, AWS RDB, FluidDB.  Those systems are not DBA-less, they have DBAs, just that the cost of the DBA is "hidden" in the per-drink rental cost of those systems, instead of sitting on your balance sheet as a salary.The statement "Twitter is using Cassandra because bursty writes are cheap, compared to others" is something I said not because I knew it, but because I just learned it, and I was a bit surprised by it.  I think that the original statement was by Ryan King of Twitter, who was also on the scaling panel.My statement "Memcached should be integrated into all NoSQL stores" is something I also firmly believe.  The very-high-performance in-memory distributed key value store is a very useful building block for larger systems, and I think that whatever larger NoSQL systems we end up will use it as a component of their internal implementations.The statement "being able to drop nodes as important as being able to add, because scalability is pointless w/o reliability" was also by Ryan King.  I tweeted it because it is very much something worth broadcasting and remembering it.  It has a little more context in his next statement "The first day we stood up our #cassandra cluster, 2 nodes had hdd die... Clients never noticed."  Machines fail.  And as they get faster and cheaper, and as clusters get bigger, machine failure must become something that must not be any sort of emergency.My statement "open source means folks dont need a standards body" was a extreme simplication of part of Sandro Hawke's talk.  I tweeted it because it was something I've felt to be mostly true enough for a long time, and it was nice to see someone else recognize it.  As Sandro stated later in twitter, "I think I added an important "sometimes"!".  And he is correct.  It's not true as an absolute statement.All in all, NoSQL Live was a very good conference.  I felt that the speakers taught and learned, all the other attendees taught and learned, and the networking and hallway track was first rate.  Thanks to 10gen for organizing it, and being entirely fair to their "competition" in the rapidly growing and evolving NoSQL space.
Recent Work on Improving Drizzle’s Storage Engine API
Sat, 13 Mar 2010 07:07:59 +0000 - Over the past six weeks or so, I have been working on cleaning up the pluggable storage engine API in Drizzle.  I’d like to describe some of this work and talk a bit about the next steps I’m taking in the coming months as we roll towards implementing Log Shipping in Drizzle. First, how did it come about that I started working on the storage engine API? From Commands to Transactions Well, it really goes back to my work on Drizzle’s replication system.  I had implemented a simple, fast, and extensible log which stored records of the data changes made to a server.  Originally, the log was called the Command Log, because the Google Protobuffer messages it contained were called message::Commands.  The API  for implementing replication plugins was very simple and within a month or so of debuting the API, quite a few replication plugins had been built, including one replicating to Memcached, a prototype one replicating to Gearman, and a filtering replicator plugin. In addition, Marcus Eriksson had created the RabbitReplication project which could replicate from Drizzle to other data stores, including Cassandra and Project Voldemort.  However, Marcus did not actually implement any C/C++ plugins using the Drizzle replication API.  Instead, RabbitReplication simply read the new Command Log, which due to it simply being a file full of Google Protobuffer messages, was quick and easy to read into memory using a variety of different programming languages.  RabbitReplication is written in Java, and it was great to see other programming languages be able to read Drizzle’s replication log so easily.  Marcus later coded up a C++ TransactionApplier plugin which replaces the Drizzle replication log and instead replicates the GPB messages directly to RabbitMQ. And there, you’ll note that one of the plugins involved in Drizzle’s replication system is called TransactionApplier.  It used to be called CommandApplier. That was because the GPB Command messages were individual row change events for the most part.  However, I made a series of changes to the replication API and now the GPB messages sent through the APIs are of class message::Transaction.  message::Transaction objects contain a transaction context, with information about the transaction’s start and end time, it’s transaction identifer, along with a series of message::Statement objects, each of which representing a part of the data changes that the SQL transaction made. Thus, the Command Log now turned into the Transaction Log, and everywhere the term Command was used now was replaced with the terms Transaction and Statement (depending on whether you were talking about the entire Transaction or a piece of it). Log entries were now written at COMMIT to the Transaction Log and were not written if no COMMIT occurred1. After finishing this work to make the transaction log write Transaction messages at commit time, I was keen to begin coding up the publisher and subscriber plugins which represent a node in the replication environment. However, Brian had asked me to delay working on other replication features and ensure that the replication API could support fully distributed transactions via the X/Open XA distributed transaction protocol. XA support had been removed from Drizzle when the MySQL binlog and original replication system was ripped out and needed some TLC. Fair enough, I said. So, off I went to work on XA. If Only It Were Simple… As anyone who has worked on the MySQL source code or developed storage engines for MySQL knows, working with the MySQL pluggable storage engine API is sometimes not the easiest or most straightforward thing. I think the biggest problem with the MySQL storage engine API is that, due to understandable historical reasons, it’s an API that was designed with the MyISAM and HEAP storage engines in mind. Much of the transactional pieces of the API seem to be a bolted-on afterthought and can be very confusing to work with. As an example, Paul McCullagh, developer of the transactional storage engine PBXT, recently emailed the mysql internals mailing list asking how the storage engine could tell when a SQL statement started and ended. You would think that such a seemingly basic functionality would have a simple answer. You’d be wrong. Monty Widenius answered like this: Why not simply have a counter in your transaction object for how start_stmt – reset(); When this is 0 then you know stmnt ended. In Maria we count number of calls to external_lock() and when the sum goes to 0 we know the transaction has ended. To this, Mark Callaghan responded: Why does the solution need to be so obscure? Monty answered (emphasis mine): Historic reasons. MySQL never kept a count of which handlers are used by a transaction, only which tables. So the original logic was that external_lock(lock/unlock) is called for each usage of the table, which is normally more than enough information for a handler to know when a statement starts/ends. The one case this didn’t work was in the case someone does lock tables as then external_lock is not called per statement. It was to satisfy this case that we added a call to start_stmt() for each table. It’s of course possible to change things so that start_stmt() / end_stmt() would be called once per used handler, but this would be yet another overhead for the upper level to do which the current handlers that tracks call to external_lock() doesn’t need. Well, in Drizzle-land, we aren’t beholden to “historic reasons” So, after looking through the in-need-of-attention transaction processing code in the kernel, I decided that I would clean up the API so that storage engines did not have to jump through hoops to notify the kernel they participate in a transaction or just to figure out when a statement and a transaction started and ended. The resulting changes to the API are quite dramatic I think, but I’ll leave it to the storage engine developers to tell me if the changes are good or not. The following is a summary of the changes to the storage engine API that I committed in the last few weeks. plugin::StorageEngine Split Into Subclasses The very first thing I did was to split the enormous base plugin class for a storage engine, plugin::StorageEngine, into two other subclasses containing transactional elements. plugin::TransactionalStorageEngine is now the base class for all storage engines which implement SQL transactions: /** * A type of storage engine which supports SQL transactions. * * This class adds the SQL transactional API to the regular * storage engine. In other words, it adds support for the * following SQL statements: * * START TRANSACTION; * COMMIT; * ROLLBACK; * ROLLBACK TO SAVEPOINT; * SET SAVEPOINT; * RELEASE SAVEPOINT; */ class TransactionalStorageEngine :public StorageEngine { public: TransactionalStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~TransactionalStorageEngine(); ... private: void setTransactionReadWrite(Session& session);   /* * Indicates to a storage engine the start of a * new SQL transaction. This is called ONLY in the following * scenarios: * * 1) An explicit BEGIN WORK/START TRANSACTION is called * 2) After an explicit COMMIT AND CHAIN is called * 3) After an explicit ROLLBACK AND RELEASE is called * 4) When in AUTOCOMMIT mode and directly before a new * SQL statement is started. */ virtual int doStartTransaction(Session *session, start_transaction_option_t options) { (void) session; (void) options; return 0; }   /** * Implementing classes should override these to provide savepoint * functionality. */ virtual int doSetSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doRollbackToSavepoint(Session *session, NamedSavepoint &savepoint)= 0; virtual int doReleaseSavepoint(Session *session, NamedSavepoint &savepoint)= 0;   /** * Commits either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doCommit(Session *session, bool normal_transaction)= 0;   /** * Rolls back either the "statement transaction" or the "normal transaction". * * @param[in] The Session * @param[in] true if it's a real commit, that makes persistent changes * false if it's not in fact a commit but an end of the * statement that is part of the transaction. * @note * * 'normal_transaction' is also false in auto-commit mode where 'end of statement' * and 'real commit' mean the same event. */ virtual int doRollback(Session *session, bool normal_transaction)= 0; virtual int doReleaseTemporaryLatches(Session *session) { (void) session; return 0; } virtual int doStartConsistentSnapshot(Session *session) { (void) session; return 0; } }; As you can see, plugin::TransactionalStorageEngine inherits from plugin::StorageEngine and extends it with a series of private pure virtual methods that implement the SQL transaction parts of a query — doCommit(), doRollback(), etc. Implementing classes simply inherit from plugin::TransactionalStorageEngine and implement their internal transaction processing in these private methods. In addition to the SQL transaction, however, is the concept of an XA transaction, which is for distributed transaction coordination. The XA protocol is a two-phase commit protocol because it implements a PREPARE step before a COMMIT occurs. This XA API is exposed via two other classes, plugin::XaResourceManager and plugin::XaStorageEngine. plugin::XaResourceManager derived classes implement the resource manager API of the XA protocol. plugin::XaStorageEngine is a storage engine subclass which, while also implementing SQL transactions, also implements XA transactions. Here is the plugin::XaResourceManager class: /** * An abstract interface class which exposes the participation * of implementing classes in distributed transactions in the XA protocol. */ class XaResourceManager { public: XaResourceManager() {} virtual ~XaResourceManager() {} ... private: /** * Does the COMMIT stage of the two-phase commit. */ virtual int doXaCommit(Session *session, bool normal_transaction)= 0; /** * Does the ROLLBACK stage of the two-phase commit. */ virtual int doXaRollback(Session *session, bool normal_transaction)= 0; /** * Does the PREPARE stage of the two-phase commit. */ virtual int doXaPrepare(Session *session, bool normal_transaction)= 0; /** * Rolls back a transaction identified by a XID. */ virtual int doXaRollbackXid(XID *xid)= 0; /** * Commits a transaction identified by a XID. */ virtual int doXaCommitXid(XID *xid)= 0; /** * Notifies the transaction manager of any transactions * which had been marked prepared but not committed at * crash time or that have been heurtistically completed * by the storage engine. * * @param[out] Reference to a vector of XIDs to add to * * @retval * Returns the number of transactions left to recover * for this engine. */ virtual int doXaRecover(XID * append_to, size_t len)= 0; }; and here is the plugin::XaStorageEngine class: /** * A type of storage engine which supports distributed * transactions in the XA protocol. */ class XaStorageEngine :public TransactionalStorageEngine, public XaResourceManager { public: XaStorageEngine(const std::string name_arg, const std::bitset<HTON_BIT_SIZE> &flags_arg= HTON_NO_FLAGS);   virtual ~XaStorageEngine(); ... }; Pretty clear. A plugin::XaStorageEngine inherits from both plugin::TransactionStorageEngine and plugin::XaResourceManager because it implements both SQL transactions and XA transactions. The InnobaseEngine is a plugin which inherits from plugin::XaStorageEngine because InnoDB supports SQL transactions as well as XA. Explicit Statement and Transaction Boundaries The second major change I made addressed the problem that Mark Callaghan noted in asking why finding out when a statement starts and ends was so obscure. I added two new methods to plugin::StorageEngine called doStartStatement() and doEndStatement(). The kernel now explicitly tells storage engines when a SQL statement starts and ends. This happens before any calls to Cursor::external_lock() happen, and there are no exception cases. In addition, the kernel now always tells transactional storage engines when a new SQL transaction is starting. It does this via an explicit call to plugin::TransactionalStorageEngine::doStartTransaction(). No exceptions, and yes, even for DDL operations. What this means is that for a transactional storage engine, it no longer needs to “count the calls to Cursor::external_lock()” in order to know when a statement or transaction starts and ends. For a SQL transaction, this means that there is a clear code call path and there is no need for the storage engine to track whether the session is in AUTOCOMMIT mode or not. The kernel does all that work for the storage engine. Imagine a Session executes a single INSERT statement against an InnoDB table while in AUTOCOMMIT mode. This is what the call path looks like: drizzled::Statement::Insert::execute() | -> drizzled::mysql_lock_tables() | -> drizzled::TransactionServices::registerResourceForTransaction() | -> drizzled::plugin::TransactionalStorageEngine::startTransaction() | -> InnobaseEngine::doStartTransaction() | -> drizzled::plugin::StorageEngine::startStatement() | -> InnobaseEngine::doStartStatement() | -> drizzled::plugin::StorageEngine::getCursor() | -> drizzled::Cursor::write_row() | -> InnobaseCursor::write_row() | -> drizzled::TransactionServices::autocommitOrRollback() | -> drizzled::plugin::TransactionStorageEngine::commit() | -> InnobaseEngine::doCommit() I think this will come as a welcome change to storage engine developers working with Drizzle. No More Need for Engine to Call trans_register_ha() There was an interesting comment in the original documentation for the transaction processing code. It read: Roles and responsibilities ————————– The server has no way to know that an engine participates in the statement and a transaction has been started in it unless the engine says so. Thus, in order to be a part of a transaction, the engine must “register” itself. This is done by invoking trans_register_ha() server call. Normally the engine registers itself whenever handler::external_lock() is called. trans_register_ha() can be invoked many times: if an engine is already registered, the call does nothing. In case autocommit is not set, the engine must register itself twice — both in the statement list and in the normal transaction list. That comment, and I’ve read it dozens of times, always seemed strange to me. I mean, does the server really not know that an engine participates in a statement or transaction unless the engine tells it? Of course not. So, I removed the need for a storage engine to “register itself” with the kernel. Now, the transaction manager inside the Drizzle kernel (implemented in the TransactionServices component) automatically monitors which engines are participating in an SQL transaction and the engine doesn’t need to do anything to register itself. In addition, due to the break-up of the plugin::StorageEngine class and the XA API into plugin::XaResourceManager, Drizzle’s transaction manager can now coordinate XA transactions from plugins other than storage engines. Yep, that’s right. Any plugin which implements plugin::XaResourceManager can participate in an XA transaction and Drizzle will act as the transaction manager. What’s the first plugin that will do this? Drizzle’s transaction log. The transaction log isn’t a storage engine, but it is able to participate in an XA transaction, so it will implement plugin::XaResourceManager but not plugin::StorageEngine. Performance Impact of Code Changes So, that “yet another overhead” Monty talked about in the quote above? There wasn’t any noticeable impact in performance or scalability at all. So much for optimize-first coding. What’s Next? The next thing I’m working on is removing the notion of the “statement transaction”, which is also a historical by-product, this time because of BerkeleyDB. Gee, I’ve got a lot of work ahead of me… [1] Actually, there is a way that a transaction that was rolled back can get written to the transaction log. For bulk operations, the server can cut a Transaction message into multiple segments, and if the SQL transaction is rolled back, a special RollbackStatement message is written to the transaction log.
Index only
Sat, 13 Mar 2010 02:29:00 +0000 - A problem with SQL is SQL. It is easy to write queries that require random IO in the worst case. It is usually easy to find queries that do too much random IO on a NoSQL system as you must code the extra data fetches manually.Digg has begun to write about their reasons for migrating from MySQL to Cassandra. They provide an excellent summary and then describe a performance problem fixed by the migration. I think Cassandra and a few other members of the NoSQL family are amazing technology but I don't think a migration was needed to fix this performance problem. A better index on the Diggs table would have done that. Others have said the same thing. Maybe I don't have all of the details. I can only go on what was written in the blog.You can learn more about the power of indexes at the MySQL conference.The Diggs table was the source of the problem:CREATE TABLE Diggs (  id      INT(11),  itemid  INT(11),  userid  INT(11),  digdate DATETIME,  PRIMARY KEY (id),  KEY user  (userid),  KEY item  (itemid)) ENGINE=InnoDB;It supported an important query that was too slow. A simple form of this query is:SELECT digdate, idFROM DiggsWHERE userid in (10, 20, 30) AND itemid = 50ORDER BY digdate DESC, id DESC LIMIT 4;This query requires too much random IO because it isn't index only. The query can use either the index on itemid or the index on userid. In both cases it will scan more entries than it needs to from the secondary index and then lookup the remaining columns from the primary index. Each lookup on the primary index can do one disk seek. On my test server the plan for this query is:id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra1    SIMPLE    Diggs    ref    user,item    item    5    const    8960    Using where; Using filesortAfter running the query I ran SHOW SESSION STATUS LIKE "Handler_read%" and the result from that is below. The query scanned 5000 entries from the secondary index and would have done more than 5000 disk seeks in the worst case to lookup columns from the primary key index.Variable_name    ValueHandler_read_first    0Handler_read_key    3Handler_read_next    5000Handler_read_prev    0Handler_read_rnd    0Handler_read_rnd_next    0The query is much faster for a table with different indexes CREATE TABLE DiggsFast (  id      INT(11),  itemid  INT(11),  userid  INT(11),  digdate DATETIME,  PRIMARY KEY (itemid,userid,digdate,id),  UNIQUE KEY (id)) ENGINE=InnoDB;The query has a better plan:id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra1    SIMPLE    DiggsFast    range    PRIMARY    PRIMARY    8    NULL    149    Using where; Using index; Using filesort It also is much better in reality. The output from SHOW SESSION STATUS LIKE "Handler_read%" is listed below. With a better index the query scans 150 entries from 5 range scans of the index. It should do about 5 disk seeks in the worst case. It is also index only so it doesn't have to lookup other columns after the index scan. Although that doesn't matter much in this case because the query uses the primary key index which has all columns for an InnoDB table. This query will be much faster than the previous one (5 disk seeks versus 5000).Note that this query uses the first two columns in the primary index for the predicates on itemid and userid. InnoDB stores all columns in the primary key index entries so any query that uses the PK index is index only. Variable_name    ValueHandler_read_first    0Handler_read_key    5Handler_read_next    150Handler_read_prev    0Handler_read_rnd    0Handler_read_rnd_next    0 UPDATEI created another variant of the Diggs table that uses a secondary index for the query. InnoDB includes all columns from a PK index in the secondary index to serve as the pointer to the row. Note there is a difference between being 'in the index' and being indexed.CREATE TABLE DiggsFast2 (  id      INT(11),  itemid  INT(11),  userid  INT(11),  digdate DATETIME,  KEY itemuserdig (itemid,userid,digdate),  PRIMARY KEY (id)) ENGINE=InnoDB;From the query plan:id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra1    SIMPLE    DiggsFast2    range    itemuserdig    itemuserdig    10    NULL    150    Using where; Using index; Using filesortdigdate    idAnd SHOW SESSION LIKE "Handler_read%"Variable_name    ValueHandler_read_first    0Handler_read_key    5Handler_read_next    150Handler_read_prev    0Handler_read_rnd    0Handler_read_rnd_next    0
Understanding Drizzle user authentication options – Part 2
Fri, 12 Mar 2010 22:45:26 +0000 - A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table. Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication. In this post I’ll talk about HTTP authentication which requires an external http server to implement successfully. You can look at Part 1 for PAM authentication. Compiling for http auth support By default during compilation you may find. checking for libcurl... no configure: WARNING: libcurl development lib not found: not building auth_http plugin. On Debian this is found in libcurl4-gnutls-dev. On RedHat it's in libcurl-devel. In my case I needed: $ sudo yum install curl-devel NOTE: Bug #527255 talks about issues of the message being incorrect for libcurl-devel however this appears it may be valid in Fedora Installs After successfully installing the necessary pre-requisite you should see. checking for libcurl... yes checking how to link with libcurl... -lcurl checking if libcurl has CURLOPT_USERNAME... no HTTP Authentication We need to enable the plugin at server startup. $ sbin/drizzled --mysql-protocol-port=3399 --plugin_add=auth_http & You need to ensure the auth_http plugin is active by checking the data dictionary plugin table. drizzle> select * from data_dictionary.plugins where plugin_name='auth_http'; +-------------+----------------+-----------+-------------+ | PLUGIN_NAME | PLUGIN_TYPE | IS_ACTIVE | MODULE_NAME | +-------------+----------------+-----------+-------------+ | auth_http | Authentication | TRUE | | +-------------+----------------+-----------+-------------+ The auth_http plugin also has the following system variables. drizzle> SHOW GLOBAL VARIABLES LIKE '%http%'; +------------------+-------------------+ | Variable_name | Value | +------------------+-------------------+ | auth_http_enable | OFF | | auth_http_url | http://localhost/ | +------------------+-------------------+ 2 rows in set (0 sec) In order to configure Http authentication, you need to have the following settings added to your drizzled.cnf file. For example: $ cat etc/drizzled.cnf [drizzled] auth_http_enable=TRUE auth_http_url=http://thedrizzler.com/auth NOTE: Replace the domain name with something you have, even localhost. A Drizzle restart gives us $ bin/drizzle -e "SHOW GLOBAL VARIABLES LIKE 'auth_http%'" +------------------+-----------------------------+ | Variable_name | Value | +------------------+-----------------------------+ | auth_http_enable | ON | | auth_http_url | http://thedrizzler.com/auth | +------------------+-----------------------------+ By default, currently if the settings result in an invalid url, then account validation does not fail and you can still login. It is recommended that you always configure pam authentication as well as a fall back. $ wget -O tmp http://thedrizzler.com/auth --17:32:32-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 404 Not Found 17:32:32 ERROR 404: Not Found. $ bin/drizzle drizzle > exit Configuring passwords To correctly configured your web server to perform the HTTP auth, you can use this Apache syntax as an example. The following is added to the VirtualHost entry in your web browser. <Directory /var/www/drizzle/auth> AllowOverride FileInfo All AuthConfig AuthType Basic AuthName "Drizzle Access Only" AuthUserFile /home/drizzle/.authentication Require valid-user </Directory> $ sudo su - $ mkdir /var/www/drizzle/auth $ touch /var/www/drizzle/auth/index.htm $ apachectl graceful We check we now need permissions for the URL. $ wget -O tmp http://thedrizzler.com/auth --17:35:48-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Authorization failed. You need to create the username/password for access. $ htpasswd -cb /home/drizzle/.authentication testuser sakila $ cat /home/drizzle/.authentication testuser:85/7CbdeVql4E Confirm that the http auth with correct user/password works. $ wget -O tmp http://thedrizzler.com/auth --user=testuser --password=sakila --17:37:45-- http://thedrizzler.com/auth Resolving thedrizzler.com... 208.43.73.220 Connecting to thedrizzler.com|208.43.73.220|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Drizzle HTTP Authentication in action By default we now can’t login $ bin/drizzle ERROR 1045 (28000): Access denied for user ''@'127.0.0.1' (using password: NO) $ bin/drizzle --user=testuser --password=sakila999 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) $ bin/drizzle --user=testuser --password=sakila Welcome to the Drizzle client.. Commands end with ; or \g. Your Drizzle connection id is 6 Server version: 7 Source distribution (trunk) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. drizzle>
Understanding Drizzle user authentication options – Part 1
Fri, 12 Mar 2010 21:46:49 +0000 - A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table. Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication. In this post I’ll talk about PAM authentication which is effectively your current Linux based user security. This information is based on the current build 1317. Compiling for PAM support Your Drizzle environment needs to be compiled with PAM support. You would have received the following warning during a configure. $ ./configure ... checking for libpam... no configure: WARNING: Couldn't find PAM development support, pam_auth will not be built. On Debian, libpam is in libpam0g-dev. On RedHat it's in pam-devel. The solution is provided in the warning message which is another great thing about Drizzle. The pre checks for dependencies and the optional messages like these far exceed the MySQL equivalent compilation process. In my case: $ sudo yum install pam-devel When correctly configured, it should look like: checking for libpam... yes checking how to link with libpam... -lpam Working with PAM You need to enable the PAM authentication plugin at drizzled startup. sbin/drizzled --plugin_add=auth_pam & Unfortunately connecting fails to work with time sbin/drizzle --user=testuser --password=***** --port=4427 real 0m0.003s user 0m0.003s sys 0m0.001s A look into the source at src/drizzle-2010.03.1317/plugin/auth_pam/auth_pam.cc shows a needed config file 117 retval= pam_start("check_user", userinfo.name, &conv_info, &pamh); Configuring PAM In order to enable PAM with Drizzle you need to have the following system configuration. $ cat /etc/pam.d/check_user auth required pam_unix.so account required pam_unix.so $ time sbin/drizzle --user=testuser --password=***** --port=4427 ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES) real 0m2.055s user 0m0.002s sys 0m0.002s This did some validation but still failed. It seems Bug #484069 may fix this problem, however this is not currently in the main line! Stay Tuned!
Thoughts about working in a distributed organization
Fri, 12 Mar 2010 20:00:00 +0000 - I've been working in a fully distributed work environment for almost 8 years now (I joined MySQL AB in April, 2002). Therefore I've been reading Toni Schneider's blog post about the "5 reasons why your company should be distributed" with great interest – he raised several points that I fully agree with and which I covered in my talks about "Working for a virtual company - how we do it at MySQL" at last year's next09 conference (slides, video) and at FrOSCon 2009 (video). However, Toni draws a profusely positive picture here, or, as my dear colleague Dean pointed out "The blog overly simplifies the realities of a distributed workforce, making it sound like it's all ponies and rainbows". Continue reading "Thoughts about working in a distributed organization"
451 CAOS Links 2010.03.12
Fri, 12 Mar 2010 18:06:55 +0000 - Updating the MPL. Funding for Lucid and eXo. StatusNet. And more. Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca “Tracking the open source news wires, so you don’t have to.” Updating the MPL # ZDnet reported that the 10-year-old Mozilla Public License will be updated by the end of 2010, while Mitchell Baker explained the process. Funding for Lucid and eXo # Lucid Imagination raised $10m in series B funding from Shasta Ventures, Granite Ventures and Walden International. # eXo Platform raised $6m from Auriga Partners and XAnge Capital and confirmed Bob Bickel as its chairman. Status check # StatusNet launched the StatusNet Cloud Service (SCS) into public beta, while OStatic published a Q&A with StatusNet’s CEO on the future of the open source microblogging platform provider. Busy week for Simon Phipps # Sun’s chief open source officer, Simon Phipps confirmed he will not be joining Oracle, but also confirmed his election as director of the Open Source Initiative. Joe Brockmeier asked if Simon Phipps will be able to energize the OSI. # Jay Pipes confirmed that he and many of the Sun Drizzle team are now working at Rackspace Cloud. # Open Source for America responded to the IIPA’s attack on open source. # In the first of a series of article’s OpenNMS’s Tarus Balog explained what it takes to build an open source business. http://bit.ly/cTbQq1 # Bloomberg reported, and Elliot denied, that it plans to sell Novell’s NetWare and Linux units. # Engine Yard claimed to have tripled its customer base in the last six months to reach 1,000 customers. # SpringSource introduced SpringSource tc Server Spring Edition. # Dirk Riehle outlined the three areas of open source economics. # The VAR Guy speculated about Red Hat’s apparently imminent move into business intelligence. # Digg explained its move from MySQL to Apache Cassandra. # Appcelerator Titanium 1.0 is now generally available. # SugarCRM launched its Open+ Partner Program. # Terracotta announced the availability of Ehcache 2.0 as well as upgrades to Terracotta Web Sessions. # MySQL/Memcached appliance vendor Schooner was ranked 34th on the WSJ’s list of the top 50 venture-backed companies, while Groundwork Open Source was ranked 28th. # OSS Watch published an explanation of how the threat to copyleft licenses is not proliferation, but incompatibility. # A short but sweet explanation of Cloudera’s formation and raison d’être. # An interview with WaveMaker CEO Chris Keene on commercial open source licensing, business, community strategies. # Squiz updated its MySource Matrix open source CMS with formal support for Funnelback Search. # The creators of the Hypertable open source distributed (NoSQL) database have formed Hypertable Inc.
Log Buffer #182, a Carnival of the Vanities for DBAs
Fri, 12 Mar 2010 17:01:59 +0000 - This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick! This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended. This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open. According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, read an e-book! She links a coupon for an e-book in the post. Craig Mullins notes that the mainframe is a good career choice in Mainframes: The Safe IT Career Choice. He notes that the mainframe is still not dead: People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM’s PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash… the list is endless. In other career-related news, Antonio Cangiano is looking for [2] top-notch student hackers for a 16-month internship at IBM in Toronto starting in May. All the details, including how to apply, are in Cangiano’s blog post. Willie Favero wants to know how you “solve the batch dilemma” for issues like “shrinking your batch window, designing your batch to play nicely with … OLTP” in how’s your batch workload doing? Perhaps Favero should read the updated batch best practices posted by Anthony Shorten. Bryan Smith surveys a more personal question by asking if you go both ways and “manage both DB2 for Linux, UNIX, and Windows and DB2 for z/OS” in don’t ask, don’t tell, bi-platform DBAs. This week’s Log Buffer editor admits to being a tri-platform DBA — she has tried many platforms, and in fact, many databases (MySQL, Oracle, DB2, SQL Server, Sybase, Postgres and Ingres)! Hari Prasanna Srinivasan promotes a patching survey in Oracle really wants to hear from you! Patching Survey. Henrik Loeser explains what a deadlock and a hot spot are by using a real life analogy taken from a police report in deadlock and hot spot in real life. Jamie Thomson asks why do you abbreviate schema names?. Shlomi Noach tries to solve the issue that “there is no consistent convention as for how to write [about table aliases in] an SQL query” in proper sql table alias use conventions. Noach also gives us a tip: faster than truncate. Leons Petrazickis reminds us that “rulesets are chains” and it is important to have your rulesets in the proper order in iptables firewall pitfall. Anyone interested in the history of MySQL AB will be informed after reading Dries Buytaert’s article. Gavin Towey shares his software that helps centrally manage 120 MySQL servers in qsh.pl: distributed query tool For those who want to learn more about column-oriented databases, particularly in MySQL, Robin Schumacher of the InfiniDB blog announces that there is a MySQL University session recording on MySQL column databases now available. MySQL join-fu expert Jay Pipes has moved his blog to www.joinfu.com and starts with An SQL Puzzle and of course a follow up on the sql puzzle. Ivan Zoratti is happy that finally, slides posted for the MySQL DW breakfast. Venu Anuganti gives you tips on one of the most common MySQL frustrations: optimizing subqueries in how to improve subqueries derived tables performance. Justin Swanhart posts the way in which he Gets Linux performance information from your MySQL database without shell access and emulates a ‘top’ CPU summary using /proc/stat and MySQL using the same method. The Oracle Apps blog has an introduction to Oracle user productivity kit (UPK). Even though in this editor’s opinion the article is very sales-pitchy, it has valuable information, and does indeed live up to its promise: UPK is a software tool that can capture all the steps in a system process. It records every keystroke, every click of the mouse, each menu option chosen and each button pressed. All this is done in the UPK Recorder by going through the transaction and pressing “printscreen” after every user action. From this, without any further effort from the developer, UPK builds a number of valuable outputs. Allen White gives a great tip on how to optimize queries in keep your data clean. Mike Dietrich reminds you to remove “old” parameters and events from your init.ora when upgrading, “as keeping them will definitely slow down the database performance in the new release.” He shows evidence of slowness when this is not done. Dietrich also shows how you can be gathering workload statistics “to give the optimizer some good knowledge about how powerful your IO-system might be”, especially “a few days after upgrading to the new release…while a real workload is running.” Brian Aker shows the exciting features coming soon in Drizzle in Drizzle, Cherry, Roadmap for our Next Release. Maybe you are thinking of migrating, not upgrading…..The O’Reilly Radar shows how to asses an Oracle to MySQL migration in MySQL migration and risk management. Actually, that article interviews Ronald Bradford on the subject — Bradford has been prolific lately, updating free my.cnf advice series and “Don’t Assume”: MySQL for the Oracle DBA series. Nick Quarmby also talks about migrating Oracle, but not to a new database, just to a new platform, in his primer on migrating Oracle Applications to new platforms. And the big news comes from Carlos of dataprix that Twitter will migrate from MySQL to Cassandra DB. Paul S. Randal explains his way of benchmarking: 1 Tb table population on SQL Server. Pete Finnigan shares his slides from a webinar on how to secure oracle, and Denis Pilipchuk shares his approaches for discovering security vulnerabilities in software applications. Jeff Davis shares his thoughts about scalability and the relational model. Robert Treat responds actually, the relational model doesn’t scale and Baron Schwartz counters with NoSQL doesn’t mean non-relational. Buck Woody explains “whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options” — and of course what those options are — in system variables, stored procedures or functions for meta data. This week’s T-SQL Tuesday topic was I/O. There are many links to great blog posts in the comments; three random posts I chose to highlight: Michael Zilberstein talks about IO capacity planning, while Kalen Delaney talks about using STATISTICS IO in I/O, you know, and Merrill Aldrich chimes in with information on real world SSD’s. Aldrich also begs folks not to waste resources and make more work for developers and DBAs in dear ISV, you’re keeping me awake nights with your VARCHAR() dates. And we end with a bit of fin: Paul Nielsen wants us all to have a bit of fun; he has posted an SQL limerick and asks readers to create there own in there once was in Dublin a query.
Using ext4 for MySQL
Fri, 12 Mar 2010 16:51:32 +0000 - This week with a client I saw ext4 used for the first time on a production MySQL system which was running Ubuntu 9.10 (Karmic Koala). I observe today while installing 9.10 Server locally that ext4 is the default option. The ext4 filesystem is described as better performance, reliability and features while there is also information about improvements in journaling. At OSCON 2009 I attended a presentation on Linux Filesystem Performance for Databases by Selena Deckelmann in which ext4 was included. While providing some improvements in sequential reading and writing, there were issue with random I/O which is the key for RDBMS products. Is the RAID configuration (e.g. RAID 5, RAID 10), strip size, buffer caches, LVM etc more important then upgrading from ext3 to ext4? I don’t have access to any test equipment in order to determine myself however I’d like to know of any experiences from members of the MySQL community and if anybody has experienced any general problems running ext4. ext4 References Ext 4 How To on kernel.org Ext4 on kernelnewbies.org ext4ext4 overview via wikipedia.org First benchmarks of the ext4 file system
The Art of "What is going on inside of my database?"
Fri, 12 Mar 2010 16:50:02 +0000 - Yesterday we were having a conversation on IRC about the need for more useful information about the internals of the database."SHOW STATUS" is just too primitive in its design to provide the sort of detailed information you need to do operations. Yesterday we got a bug request over the number of "open tables" found after a particular query. The user had assumed the number was off, but what they hadn't realized was that the number was accurate (in this particular case, MySQL fudges a number on open tables because it can't handle count its derived tables).One of the patches coming into the tree right now fully exposes the contents of the table cache and table definition cache to the user. You can see who holds what locks on what tables, and you can see the actual count on table access per table. drizzle> select * from TABLE_DEFINITION_CACHE; +-----------------+------------------------+---------+-------------+----------------+ | TABLE_SCHEMA | TABLE_NAME | VERSION | TABLE_COUNT | IS_NAME_LOCKED | +-----------------+------------------------+---------+-------------+----------------+ | data_dictionary | schema_names | 1 | 1 | FALSE | | data_dictionary | table_definition_cache | 1 | 1 | FALSE | | data_dictionary | show_tables | 1 | 1 | FALSE | +-----------------+------------------------+---------+-------------+----------------+ 3 rows in set (0 sec) drizzle> select * from TABLE_CACHE; +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ | SESSION_ID | TABLE_SCHEMA | TABLE_NAME | ARCHETYPE | ENGINE | VERSION | IS_NAME_LOCKED | ROWS | AVG_ROW_LENGTH | TABLE_SIZE | AUTO_INCREMENT | +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ | 0 | data_dictionary | schema_names | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 260 | 0 | 0 | | 0 | data_dictionary | show_tables | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 260 | 0 | 0 | | 1 | data_dictionary | table_cache | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 1113 | 0 | 0 | | 0 | data_dictionary | table_definition_cache | FUNCTION | FunctionEngine | 1 | FALSE | 100 | 559 | 0 | 0 | +------------+-----------------+------------------------+-----------+----------------+---------+----------------+------+----------------+------------+----------------+ 4 rows in set (0 sec) drizzle> select * from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 1; Empty set (0 sec) drizzle> select * from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 0; +-----------------+------------------------+---------+-------------+----------------+ | TABLE_SCHEMA | TABLE_NAME | VERSION | TABLE_COUNT | IS_NAME_LOCKED | +-----------------+------------------------+---------+-------------+----------------+ | data_dictionary | schema_names | 1 | 1 | FALSE | | data_dictionary | table_cache | 1 | 1 | FALSE | | data_dictionary | table_definition_cache | 1 | 1 | FALSE | | data_dictionary | show_tables | 1 | 1 | FALSE | +-----------------+------------------------+---------+-------------+----------------+ 4 rows in set (0 sec) drizzle> select count(*) from TABLE_DEFINITION_CACHE WHERE TABLE_COUNT > 0; +----------+ | count(*) | +----------+ | 4 | +----------+ 1 row in set (0 sec) The term "ARCHETYPE" is the base primitive about what sort of table was used. It is more detailed then the ANSI "TABLE_TYPE" that exists in I_S. We still have a debate on what exactly this term should mean. One of the things I enjoy about working on Drizzle? I am not stuck in a room full of people who will spend hours on this sort of bike shed decisions. Version gives you the current definition count for the table. Right now that number is still based on "since opened" but we will soon be storing the metadata for this so you will know how many times in the life of an object it has been changed.We are still working out the details to SHOW TABLE STATUS. Our SHOW commands are just query rewrites to tables.Here is a partial example of the new table that is outputted from a SHOW TABLE STATUS: +---------+--------+-------------------+----------- | Session | Schema | Name | Type +---------+--------+-------------------+----------- | 0 | Schema | b | STANDARD | 0 | Schema | show_tables | FUNCTION | 1 | Schema | show_table_status | FUNCTION | 0 | Schema | schema_names | FUNCTION | 0 | Schema | dfsdf | STANDARD | 1 | Schema | b | TEMPORARY | 1 | Schema | a | TEMPORARY +---------+--------+-------------------+-----------+ 7 rows in set (0 sec) Notice Session? Notice that type can be Temporary? With our system you can see the current owner of the open table and we now include whatever temporary tables you have in your own session. We have also included a larger table which you can see just your own temporary tables (and most likely I will soon create a table so that you can see all temporary tables open across all sessions). The current "Type" in SHOW TABLE STATUS is an Archetype so we will be changing that so that terms match up across the database. Consistency in design is an awesome thing :)There is a lot more to come!P.S. Just wait until I push the code for tracking locks in Drizzle, I demoed it at SCALE and got a lot of both positive and constructive feedback on it.
But I DO want MySQL to say “ERROR”!
Fri, 12 Mar 2010 04:53:28 +0000 - MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data. Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though). Calling CREATE TEMPORARY TABLE? You get silent commit. Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error. Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported. Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken. Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you. Warnings and errors It would be nice to: Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says. Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions). Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL. If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”). MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.
A Follow Up on the SQL Puzzle
Thu, 11 Mar 2010 22:12:35 +0000 - Or…What the Heck is Wrong with CREATE TABLE IF NOT EXISTS ... SELECT? So, earlier this week, I blogged about an SQL puzzle that had come up in my current work on Drizzle’s new transaction log. I posed the question to readers what the “correct” result of the following would be: CREATE TABLE t1 (a int, b int); INSERT INTO t1 VALUES (1,1),(1,2); CREATE TEMPORARY TABLE t2 (a int, b int, PRIMARY KEY (a)); BEGIN; INSERT INTO t2 VALUES (100,100); CREATE TEMPORARY TABLE IF NOT EXISTS t2 (PRIMARY KEY (a)) SELECT * FROM t1;   # The above statement will correctly produce an ERROR 23000: Duplicate entry '1' for key 'PRIMARY' # What should the below result be?   SELECT * FROM t2; COMMIT; A number of readers responded, and, to be fair, most everyone was “correct” in their own way. Why? Well, because the way that MySQL deals with calls to CREATE TABLE ... SELECT, CREATE TABLE IF NOT EXISTS ... SELECT and their temporary-table counterparts is completely stupid, as I learned this week. Rob Wultsch essentially sums up my feelings about the behaviour of DDL statements in regards to transactions in a session: Implicit commit is evil and stupid. Ideally we the server should error and roll back, imho. The Officially Correct Answer (at least in MySQL) OK, so here’s the “official” correct answer: CREATE TABLE IF NOT EXISTS ... SELECT does not first check for the existence of the table in question. Instead, if the table in question does exist, CREATE TABLE IF NOT EXISTS ... SELECT behaves like an INSERT INTO ... SELECT statement. Yep, you heard right. So, instead of throwing a warning when it notices that the table exists, MySQL instead attempts to insert rows from the SELECT query into the existing table. Here is the official MySQL explanation: For CREATE TABLE … SELECT, if IF NOT EXISTS is given and the table already exists, MySQL handles the statement as follows: * The table definition given in the CREATE TABLE part is ignored. No error occurs, even if the definition does not match that of the existing table. * If there is a mismatch between the number of columns in the table and the number of columns produced by the SELECT part, the selected values are assigned to the rightmost columns. For example, if the table contains n columns and the SELECT produces m columns, where m < n, the selected values are assigned to the m rightmost columns in the table. Each of the initial n – m columns is assigned its default value, either that specified explicitly in the column definition or the implicit column data type default if the definition contains no default. If the SELECT part produces too many columns (m > n), an error occurs. * If strict SQL mode is enabled and any of these initial columns do not have an explicit default value, the statement fails with an error. So, given the above manual explanation, the correct answer to the original blog post is: a | b 100 | 100 partly because there is an implicit COMMIT directly before the CREATE TABLE is executed (committing the 100,100 record to the table) and the primary key violation kills off the INSERTs of 1,1 in InnoDB. For a MyISAM table, the 1,1 record would be in the table, since MyISAM has no idea what a ROLLBACK is. I Think Drizzle Should Follow PostgreSQL’s Example Here On implicit commits before DDL operations, I believe they should all go bye-bye. DDL should be transactional in Drizzle and if a statement cannot be executed in a transaction, it should throw an error if there is an active transaction. Period. For behaviour of CREATE TABLE ... SELECT acting like an INSERT INTO ... SELECT, that entire code path should be ripped out. PostgreSQL’s DDL operations, IMHO, are sane. Sane is good. PostgreSQL allows quite a bit of flexibility by implementing the SQL standard’s CREATE TABLE and CREATE TABLE AS statements. I believe Drizzle should scrap all the DDL table-creation code and instead implement PostgreSQL’s much-nicer DDL methods. There, I said it. Slashdot MySQL haters, there ya go.
Drizzle’s Data Dictionary and Global Status
Thu, 11 Mar 2010 21:33:14 +0000 - With the recent news by Brian about the Data Dictionary in Drizzle replacing the INFORMATION_SCHEMA, I was looking into the server status variables (aka INFORMATION_SCHEMA.GLOBAL_STATUS) and I came across an interesting discovery. select * from data_dictionary.global_status; ... | Table_locks_immediate | 0 | | Table_locks_waited | 0 | | Threads_connected | 8134064 | | Uptime | 332 | | Uptime_since_flush_status | 332 | +----------------------------+----------------+ 51 rows in set (0 sec) This only retrieved 51 rows, which is way less then previous. What I wanted was clearly missing, all the old com_ status variables. Looking at what the data_dictionary actually has available revealed a new table. drizzle> select * from data_dictionary.global_statements; +-----------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-----------------------+----------------+ | admin_commands | 0 | | alter_db | 0 | | alter_table | 0 | | analyze | 0 | | begin | 0 | | change_db | 1 | | check | 0 | | checksum | 0 | | commit | 0 | | create_db | 0 | | create_index | 0 | | create_table | 0 | | delete | 0 | | drop_db | 0 | | drop_index | 0 | | drop_table | 0 | | empty_query | 0 | | flush | 0 | | insert | 0 | | insert_select | 0 | | kill | 0 | | load | 0 | | release_savepoint | 0 | | rename_table | 0 | | replace | 0 | | replace_select | 0 | | rollback | 0 | | rollback_to_savepoint | 0 | | savepoint | 0 | | select | 10 | | set_option | 0 | | show_create_db | 0 | | show_create_table | 0 | | show_errors | 0 | | show_warnings | 0 | | truncate | 0 | | unlock_tables | 0 | | update | 0 | +-----------------------+----------------+ 38 rows in set (0 sec) Kudos to this. Looking at list I saw an obvious omission, of “ping”. Something that caught me out some years ago with huge (300-500 per second admin_commands). I’m also a fan of Mark’s recent work An evening hack – Com_ping in MySQL.
Liveblogging at Confoo: Blending NoSQL and SQL
Thu, 11 Mar 2010 16:11:53 +0000 - Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332. Michael Bleigh from Intridea, high-end Ruby and Ruby on Rails consultants, build apps from start to finish, making it scalable. He’s written a lot of stuff, available at http://github.com/intridea. @mbleigh on twitter NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, Consistency, Isolation, Durability). Generally, most NoSQL systems have: Denormalization Eventual Consistency Schema-Free Horizontal Scale NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons. NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything. “NoSQL is a (growing) collection of tools, not a new way of life.” What is NoSQL? Can be several things: Key-Value Stores Document Databases Column-oriented data stores Graph Databases Key-Value Stores memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk. Speaker’s favorite is Redis because it’s similar to memcached. key-value store + datatypes (list, sets, scored sets, soon hashes will be there) cache-like functions (like expiration) (Mostly) in-memory Another interesting key-value store is Riak Combination of key-value store and document database heavy into HTTP REST You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store built-in Map Reduce Map Reduce: Massively parallel way to process large datasets First you scour data and “map” a new set of dataM Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them… http://en.wikipedia.org/wiki/MapReduce Other key-value stores: Tokyo Cabinet Dynomite memcachedDB Voldemort Document Databases Some say that it’s the “closest” thing to real SQL. MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff Also has GridFS that can store large files efficiently, can scale to petabytes of data does have MapReduce but it’s deliberate and you run it every so often. CouchDB Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript. HTTP REST interface MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do. Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB Column-oriented store Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily. Cassandra – Built by Facebook, also used by Twitter BigTable Hypertable HBase Graph Databases speaker’s opinion – there aren’t enough of these. Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license. When should I use this stuff? If you have:Use Complex, slow joins for an “activity stream”Denormalize, use a key-value store. Variable schema, vertical interactionDocument database or column store Modeling multi-step relationships (linkedin, friends of friends, etc)Graph Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!) NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those. A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql. Using SQL and NoSQL together Why? Well, your data is already in an SQL database (most likely). You can blend by hand, but the easy way is DataMapper: Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores) Implements Identity Map Module-based inclusion (instead of extending from a class, you just include into a class). You can set up multiple data targets (default is MySQL, example sets up MongoDB too). DataMapper is: Ultimate Polyglot ORM simple r’ships btween persistence engines are easy jack of all, master none Sometimes perpetuates false assumptions – If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway. Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA. The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not Right now – partially working activeRecord relationships Is he doing it wrong? Is it a crazy/stupid idea? Maybe. Example: NeedUse Assume you already have an auth systemit’s already in SQL, so leave it there. Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL. Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.
1.1.0 Alpha Release Now Available
Thu, 11 Mar 2010 15:31:58 +0000 - We are excited to announce the availability of the 1.1.0 Alpha release of InfiniDB Community Edition.  This is our initial alpha release for 1.1 and is not recommended for production work. New functionality we’ve added with 1.1.0 includes:  Improved support for queries with multiple types of joins.Improved create table performance and a much smaller footprint on disk for the system catalog, newly created tables, and tables that contaRead More...
Liveblogging at Confoo: [not just] PHP Performance by Rasmus Lerdorf
Thu, 11 Mar 2010 14:29:46 +0000 - Most of this stuff is not PHP specific, and Python or Ruby or Java or .NET developers can use the tools in this talk. The session on joind.in, with user comments/feedback, is at http://joind.in/talk/view/1320. Slides are at http://talks.php.net/show/confoo10 “My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.” “Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon. Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site. Google has developed the Firefox/Firebug pagespeed tool. Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results. Before you do anything else install an opcode cache like APC. Wordpress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time. Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O. Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors. AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP). Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide. Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…) In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide. His tips to change: Conditional config include in wp-load.php (as just mentioned) Conditional did-header check in wp-blog-header.php Don’t call require_wp_db() from wp-settings.php Remove conditional require logic from wp_start_object_cache Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of Wordpress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast. Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster. The slide of warnings that wordpress throws. Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows. Now dive into the PHP executor, by installing XDebug. Check xhprof – Facebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive. For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc. After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far. So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second….. At this point we’re out of low-hanging fruit. HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “Wordpress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard Wordpress-svn checkout with a few tweaks.” Then, of course, benchmark again. The first time you compile Wordpress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and Wordpress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part. Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors. (Wordpress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….) You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it. There are limitations to HipHop, for example: It doesn’t support any of the new PHP 5.3 language features Private properties don’t really exist under HipHop. They are treated as if they are protected instead. You can’t unset variables. unset will clear the variable, but it will still be in the symbol table. eval and create_function are limited Variable variables $$var are not supported Dynamic defines won’t work: define($name,$value) get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work Conditional and dynamically created include filenames don’t work as you might expect Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports. Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow). In summary, “performance is all about architecture”, “know your costs”. Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.
Writing A Storage Engine for Drizzle, Part 2: CREATE TABLE
Thu, 11 Mar 2010 14:27:34 +0000 - The DDL code paths for Drizzle are increasingly different from MySQL. For example, the embedded_innodb StorageEngine CREATE TABLE code path is completely different than what it would have to be for MySQL. This is because of a number of reasons, the primary one being that Drizzle uses a protobuf message to describe the table format instead of several data structures and a FRM file. We are pretty close to having the table protobuf message format being final (there’s a few bits left to clean up, but expect them done Real Soon Now (TM)). You can see the definition (which is pretty simple to follow) in drizzled/message/table.proto. Also check out my series of blog posts on the table message (more posts coming, I promise!). Drizzle allows either your StorageEngine or the Drizzle kernel to take care of storage of table metadata. You tell the Drizzle kernel that your engine will take care of metadata itself by specifying HTON_HAS_DATA_DICTIONARY to the StorageEngine constructor. If you don’t specify HTON_HAS_DATA_DICTIONARY, the Drizzle kernel stores the serialized Table protobuf message in a “table_name.dfe” file in a directory named after the database. If you have specified that you have a data dictionary, you’ll also have to implement some other methods in your StorageEngine. We’ll cover these in a later post. If you ever dealt with creating a table in MySQL, you may recognize this method: virtual int create(const char *name, TABLE *form, HA_CREATE_INFO *info)=0; This is not how we do things in Drizzle. We now have this function in StorageEngine that you have to implement: int doCreateTable(Session* session, const char *path,                   Table& table_obj,                  drizzled::message::Table& table_message) The existence of the Table parameter is largely historic and at some point will go away. In the Embedded InnoDB engine, we don’t use the Table parameter at all. Shortly we’ll also get rid of the path parameter, instead having the table schema in the Table message and helper functions to construct path names. Methods name “doFoo” (such as doCreateTable) mean that there is a method named foo() (such as createTable()) in the base class. It does some base work (such as making sure the table_message is filled out and handling any errors) while the “real” work is done by your StorageEngine in the doCreateTable() method. The Embedded InnoDB engine goes through the table message and constructs a data structure for the Embedded InnoDB library to create a table. The ARCHIVE storage engine is much simpler, and it pretty much just creates the header of the ARZ file, mostly ignoring the format of the table. The best bet is to look at the code from one of these engines, depending on what type of engine you’re working on. This code, along with the table message definition should be more than enough
Surveying MySQL’s Popular Storage Engines
Thu, 11 Mar 2010 13:00:35 +0000 - In this month’s Database Journal piece we look at the spectrum of MySQL storage engines available, and examine what some of their strengths and weaknesses are. View the article here: Survey of MySQL Storage Engines
Emulating a 'top' CPU summary using /proc/stat and MySQL
Thu, 11 Mar 2010 08:54:50 +0000 - In my last blog post, I showed how we can get some raw performance information from /proc into the MySQL database using a LOAD DATA INFILE (LDI) command. I've modified that LDI call slightly to set the `other` column to equal the sum total of the CPU counters for those rows which begin with 'cpu'.original: other = IF(@the_key like 'cpu%', NULL , @val1);new: other = IF(@the_key like 'cpu%', user + nice + system + idle + iowait + irq + softirq + steal + guest, @val1);Top provides a useful output that looks something like the following: top - 04:59:14 up 14 days, 3:34, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 216 total, 1 running, 215 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8172108k total, 5115388k used, 3056720k free, 315180k buffers Swap: 2097144k total, 0k used, 2097144k free, 3630748k cached The information I'm currently concerned with presenting is the CPU summary: Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st In order to emulate this display, we will need to sample two data points from /proc/stat. Load the data from proc_stat Sleep 1 second Load the data again Compare the valuesYou should end up with something similar to the following: mysql> select * from test.proc_stat where the_key = 'cpu'; +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | seq | the_key | user | nice | system | idle | iowait | irq | softirq | steal | guest | other | +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | 1 | cpu | 440022 | 36207 | 94583 | 1976124562 | 89082 | 858 | 27243 | 0 | 0 | 1976812557 | | 24 | cpu | 440024 | 36207 | 94583 | 1976130493 | 89082 | 858 | 27243 | 0 | 0 | 1976818490 | +-----+---------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ 2 rows in set (0.00 sec) To display the CPU utilization, run the following query: select 100 * ( ( new.user - old.user ) / ( new.other - old.other ) ) user, 100 * ( ( new.nice - old.nice ) / ( new.other - old.other ) ) nice, 100 * ( ( new.system - old.system ) / ( new.other - old.other ) ) system, 100 * ( ( new.idle - old.idle ) / ( new.other - old.other ) ) idle, 100 * ( ( new.iowait - old.iowait ) / ( new.other - old.other ) ) iowait, 100 * ( ( new.irq - old.irq ) / ( new.other - old.other ) ) irq, 100 * ( ( new.softirq - old.softirq ) / ( new.other - old.other ) ) softer, 100 * ( ( new.steal - old.steal ) / ( new.other - old.other ) ) steal, 100 * ( ( new.guest - old.guest ) / ( new.other - old.other ) ) guest from test.proc_stat old, test.proc_stat new where new.seq > old.seq and old.the_key = 'cpu' and new.the_key = old.the_key; +--------+--------+--------+---------+--------+--------+--------+--------+--------+ | user | nice | system | idle | iowait | irq | softer | steal | guest | +--------+--------+--------+---------+--------+--------+--------+--------+--------+ | 0.0337 | 0.0000 | 0.0000 | 99.9663 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +--------+--------+--------+---------+--------+--------+--------+--------+--------+ 1 row in set (0.01 sec) edit: for completeness sake, here is the SQL script I execute to load the data from proc: CREATE TABLE IF NOT EXISTS test.proc_stat ( seq tinyint auto_increment primary key, the_key char(25) NOT NULL, user bigint, nice bigint, system bigint, idle bigint, iowait bigint, irq bigint, softirq bigint, steal bigint, guest bigint, other bigint ); /* MySQL treats consecutive delimiters as separate fields, so some fancy footwork is required to load the file successfully. The file includes a cpu field followed by two spaces which is the sum of all the individual CPUs in the system. To account for this each row is read into some MySQL variables. Those variables are examined to determine which field holds the correct value. */ LOAD DATA INFILE '/proc/stat' IGNORE INTO TABLE test.proc_stat FIELDS TERMINATED BY ' ' (@the_key, @val1, @val2, @val3, @val4, @val5, @val6, @val7, @val8, @val9, @val10) SET the_key = @the_key, user = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val1, 0), IFNULL(@val2,0))), nice = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val2, 0), IFNULL(@val3,0))), system = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val3, 0), IFNULL(@val4,0))), idle = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val4, 0), IFNULL(@val5,0))), iowait = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val5, 0), IFNULL(@val6,0))), irq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val6, 0), IFNULL(@val7,0))), softirq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val7, 0), IFNULL(@val8,0))), steal = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val8, 0), IFNULL(@val9,0))), guest = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val9, 0), IFNULL(@val10,0))), other = IF(@the_key like 'cpu%', user + nice + system + idle + iowait + irq + softirq + steal + guest, @val1);
Continuing the journey
Thu, 11 Mar 2010 07:42:46 +0000 - A couple of months ago (December 1st for those playing along at home) it marked five years to the day that I started at MySQL AB (now Sun, now Oracle). A good part of me is really surprised it was for that long and other parts surprised it wasn’t longer. Through MySQL and Sun, I met some pretty amazing people, worked with some really smart ones and formed really solid and awesome friendships. Of course, not everything was perfect (sometimes not even close), but we did have some fun. Up until November 2008 (that’s 3 years and 11 months for those playing at home) I worked on MySQL Cluster. Still love the product and love how much better we’re making Drizzle so it’ll be the best SQL interface to NDB :) The ideas behind Drizzle had been talked about for a while… and with my experience with internals of the MySQL server, I thought that some change and dramatic improvement was sorely needed. Then, in 2008, Brian created a tree. I was soon sending in patches at nights, we announced to the whole world at OSCON and it captured a lot of attention. Since November 2008 I’ve been working on Drizzle full time. It was absolutely awesome that I had the opportunity to spend all my days hacking on Drizzle – both directly with fantastic people and for fantastic people. But… the Sun set… which was exciting and sad at the same time. Never to fear! There were plenty of places wanting Drizzle hackers (and MySQL hackers). For me, it came down to this: “real artists ship”. While there were other places where I would no doubt be happy and work on something really cool, the only way I could end up working out where I should really be was: what is the best way to have Drizzle make a stable release that we’d see be suitable for deployment? So, Where Am I Now? Rackspace. Where I’ll again be spending all my time hacking Drizzle.
SQL syntax with /*! c-style comments in MySQLdump
Thu, 11 Mar 2010 07:27:24 +0000 - In mysql we have — , /* and /*! comments.  This post is mainly about very basic c-style comments. /*! : C-Style comments in MySQL We normally see comments in MySQLdump as follows: /*!40000 ALTER TABLE `a` DISABLE KEYS */; Or /*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER */ These are actually C-Style comments which has embeded sql and treated specially [...] Related posts:Monitor mysql replication using php Perl Script for Analyze – Optimize – Repair Mysql Databases Replication slave lag monitoring using heartbeat and windows batch scripts
Proper SQL table alias use conventions
Thu, 11 Mar 2010 07:10:09 +0000 - After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query. I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use. Using the sakila database, take a look at the following queries: Query #1 SELECT R.rental_date, C.customer_id, C.first_name, C.last_name FROM rental R JOIN customer C USING (customer_id) WHERE R.rental_date >= DATE('2005-10-01') AND C.store_id=1; The above looks for film rentals done in a specific store (store #1), as of Oct. 1st, 2005. Query #2 SELECT F.title, C.name FROM film AS F JOIN film_category AS S ON (F.film_id = S.film_id) JOIN category AS C ON (S.category_id = C.category_id) WHERE F.length > 180; The above lists the title and category for all films longer than three hours. Query #3 SELECT c.customer_id, c.last_name FROM customer c INNER JOIN address a ON (c.address_id = a.address_id) INNER JOIN ( SELECT c.city_id FROM city AS c JOIN country s ON (c.country_id = s.country_id) WHERE s.country LIKE 'F%' ) s1 USING (city_id) WHERE create_date >= DATE('2005-10-01'); The above lists customers created as of Oct. 1st, 2005, and who live in countries starting with an ‘F’. The query could be solved without a subquery, but there’s a good reason why I made it so. The problems I used very different conventions on any one of the queries, and sometimes within each query. And it’s common that I see the same on a customer’s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I’ll leaver the rest to the reader. Here’s where I see problems: Query #1: In itself, it looks fine. Rental turns to R, Customer turns to C. I will comment on this slightly later on when I provide my full opinion. Query #2: So film turns to F, category turns to C. What should film_category turn into? Out of letters? Let’s just go for S, shall we? But S has nothing do with film_category. Yet it’s so commonly seen. Query #2: We’re using the AS keyword now. We didn’t use it before. Queries #1, #2: Hold on. Wasn’t C taken for customer in Query #1? Now, in Query #2 it stands for category? I’m beginning to get confused. Query #3: Now aliases are lower case; I was just getting used to them being upper case. Query #3: But, hey, c is back to customer! Query #3: Or, is it? Take a look at the subquery. Theres another c in there! This time it’s city! And it’s perfectly valid syntax. We actually have two identical aliases in the same query. Query #3: If I could, I would name country with c as well. But I can’t. So why not throw in s again? Query #3: and now I don’t even bother using the alias when accessing the create_date. Well, there’s no such column in any of the other tables! Proper conventions What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read. Any of the above issues could be justified. But I wish to make some suggestions: Decide whether you’re going for upper or lower case. Do not use the same alias twice in your query, even if it’s valid. Aliases do not have to be single character. film_category may just as well be FC. Do not alias something that is hard to interpret. s does not stand for country. Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use cust instead of c. The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another! My own convention Simple: I only use aliases when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer. I know this does not appeal to many. But there’s no confusion. And it’s easily searchable. And it’s consistent. And if properly formatted, as in the above queries, is well readable. Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.
[RH]acker
Thu, 11 Mar 2010 00:39:14 +0000 - As I'm sure everyone has figured out by now, I've joined Rackspace where I will continue to work on Drizzle. I'm honestly thrilled with my new home, and there are a myriad of reasons for that. I think the one that I'm most excited about is that they are already the thing that all of the hype was about MySQL and RedHat and IBM wanting to become: A Service Company Rackspace doesn't want you to run Rackspace-Apache or RackspaceDB or EC-Rackspace. They want you to be able to run bog-standard Apache. And Linux. And MySQL. And PHP. And Drizzle. Then, Rackspace wants to be the best at providing you the service you need around those. No ludicrous MySQL Enterprise "we'll sell you a license to a free product, and then we'll include bundled with that a subscription a piece of non-free monitoring software" upselling. Rackspace actually wants to provide you a valuable service, and they want to do such a good job at it that you will happily pay them to do it. For developers, there is a wonderful upside to this: Rackspace doesn't want a special internal Rackspace-only version of anything. It has no value that way. They want the good software to be ubiquitous so that they can compete in the service arena. This means that they don't want assignment of copyright. This means they don't have crazy policies about what Free Software projects you can and cannot contribute to. Rackspace goes one step further than "do no evil" ... they actually want you to try to improve the state of the art - which goes right to the core of why I'm involved with Free Software in the first place. I truly believe that Free will always win over Restricted, that Open beats Closed and that Sharing will always improve the world before Hoarding. I've always contended that a company can be successful and make the world a better place and that the two are not mutually exclusive. I am thrilled to now be a part of a company where I can do my best to prove it. 
Talking at the University of Utah
Wed, 10 Mar 2010 23:56:00 +0000 - Giving a talk at the University of Utah on everything from scaling, clustering, mysql, mysql internals, noSQL (Cassandra) to how to manage all this stuff. If you are there at University I'm bringing some Swag!Also I will upload the slides and put them here.
Peter Gulutzan at the O’Reilly MySQL Conference
Wed, 10 Mar 2010 21:47:54 +0000 - I will be doing two talks at the O’Reilly MySQL Conference & Expo in Santa Clara CA. Performance Schema Tuesday April 13, 11:55am. Demos Of All The Big New Features Thursday April 15, 11:55am, with Konstantin Osipov. The other MySQL server engineers giving talks are: Alexander Barkov (globalization) Chuck Bell (backup) Mattias Jonsson (partitions) Mats Kindahl (replication) Konstantin Osipov (runtime) Inaam Rana (InnoDB) Mikael Ronstrom (partitions) Calvin Sun (InnoDB) Lars Thalmann (replication) Jimmy Yang (InnoDB) Maybe we’ll have a Birds of a Feather session too.
mk-schema-change? Check out ideas from oak-online-alter-table
Wed, 10 Mar 2010 18:28:29 +0000 - In response to Mark Callaghan’s post mk-schema-change. I apologize for not commenting on the post itself, I do not hold a Facebook account. Anyway this is a long write, so it may as well deserve a post of its own. Some of the work Mark is describing already exists under openark kit’s oak-online-alter-table. Allow me to explain what I have gained there, and how the issue can be further pursued. There is relevance to Mark’s suggestion. oak-online-alter-table uses a combination of locks, chunks and triggers to achieve an almost non-blocking ALTER TABLE effect. I had a very short opportunity to speak with Mark on last year’s conference, in between bites. Mark stated that anything involving triggers was irrelevant in his case. The triggers are a pain, but I believe a few other insights from oak-online-alter-table can be of interest. The first attempt My first attempt with the script assumed: Table has an AUTO_INCREMENT PRIMARY KEY column New rows always gain ascending PRIMARY KEY values PRIMARY KEY never changes for an existing row PRIMARY KEY values are never reused Rows may be deleted at will No triggers exist on the table No FOREIGN KEYs exist on the table. So the idea was: when one wants to do an ALTER TABLE: Create a ghost table with the new structure. Read the minimum and maximum PK values. Create AFTER INSERT, AFTER UPDATE, AFTER DELETE triggers on the original table. These triggers will propagate the changes onto the ghost table. Working out slowly, and in small chunks, copy rows within recorded min-max values range into the ghost table. The interesting part is where the script makes sure there’s no contradiction between these actions and those of the triggers, (whichever came first!). This is largely solved using INSERT IGNORE and REPLACE INTO in the proper context. Working out slowly and in chunks again, we remove rows from the ghost table, which are no longer existent in the original table. Once all chunking is complete, RENAME original table to *_old, and ghost table in place of the original table. Steps 4 & 5 are similar in concept to transactional recovery through redo logs and undo logs. The next attempt Next phase removed the AUTO_INCREMENT requirement, as well as the “no reuse of PK”. In fact, the only remaining constraints were: There is some UNIQUE KEY on the table which is unaffected by the ALTER operation No triggers exist on the table No FOREIGN KEYs exist on the table. The steps are in general very similar to those listed previously, only now a more elaborate chunking method is used with possible non-integer, possible multi-column chunking algorithm. Also, the triggers take care of changes in UNIQUE KEY values themselves. mk-schema-change? Have a look at the wiki pages for OnlineAlterTable*. There is some discussion on concurrency issues; on transactional behavior, which explains why oak-online-alter-table performs correctly. Some of these are very relvant, I believe, to Mark’s suggestion. In particular, making the chunks copy; retaining transactional integrity, etc. To remove any doubt, oak-online-alter-table is not production ready or anywhere near. Use at your own risk. I’ve seen it work, and I’ve seen it crash. I got little feedback and thus little chance to fix things. I also didn’t touch the code for quite a few months now, so I’m a little rusty myself.
Google Summer of Code projects, Drizzle
Wed, 10 Mar 2010 18:25:25 +0000 - I've been doing Google Summer of Code projects with students since its creation. As far as intern programs go, it has been one of the most successful I have ever worked with.Last year was particularly awesome in that with Drizzle we were able to have students work on projects that made it back into Drizzle. While I have always seen good work created, it has always been hit or miss on whether the student's work has made it back into the project. Last year though we got more code in then ever before and I believe this year will be the same. We have had students go on to jobs thanks to the work they did on Drizzle.Interning gives you real experience, and it provides resume material which differentiates students who are going on to work in the software engineering field. Working on open source means that you have real experience on your resume, experience that an employer can see. There are many positions open in the Drizzle/MySQL ecosystem and students who have real world experience should have any easy time finding work with the knowledge you will gain from this program.For Drizzle we have worked out a partial list for this year:http://drizzle.org/wiki/SocDon't see anything you like? I am happy to add new projects or work with students on libmemcached or Gearman.Are you interested in working on a different project? Apache, Linux, Postgres? Talk to those projects and ask them to either participate or suggest ideas on projects to them.
mk-schema-change
Wed, 10 Mar 2010 15:05:40 +0000 - I want a tool to make some long-running schema changes almost non-blocking. They should block access to a table for no more than a few seconds. I also want to do some of these in place on a master rather than on a slave that has been taken offline. I think this will work for most schema changes. It doesn't have to work for all of them and there are restrictions. This will not work when statements that modify the table for which the schema change is done reference other tables and the other tables are modified during the schema change. If production SQL cannot be changed to meet this restriction, then the schema change can be done on a slave that has been taken offline. Is anyone else interested in such a tool? A hand-waving description of the process is: Create the new table on the master. The new table might use MyISAM without indexes initially to make the insert as fast as possible and reduce the load on InnoDB. Run set sql_log_bin=0 as what follows should not be written to the binlog Run start transaction with consistent innodb snapshot to start an Innodb transaction and get current binlog offset of the master Run insert into new_table select * from original_table on the master. Alas, this will get a transaction duration read lock on every row in original_table unless you use row based replication or hack InnoDB or set innodb_locks_unsafe_for_binlog. Convert new_table to InnoDB and create indexes on it Replay changes from the binlogs after the point in time recorded in step #3. This should extract changes to original_table and replay them against new_table.
Presenting on new MySQL Cluster 7.1 features at MySQL UC (and discount code!)
Wed, 10 Mar 2010 10:12:06 +0000 - Together with Berndt I’ll be presenting on the new features in MySQL Cluster 7.1 at this year’s MySQL Cluster User Conference – Santa Clara, on April 12th. If you’re interested in using MySQL Cluster but aren’t sure how to get started (or you’ve used it but would like some tips) then this is a great opportunity. Check out the presentation description. If you register by 15 March then you get the early-bird price and if you use this ‘friend of a speaker’ code then you get an additional 25% off: mys10fsp mys10fsp
MySQL Cluster on Windows – webinar replay available
Wed, 10 Mar 2010 09:58:13 +0000 - If you missed the recent webinar on running MySQL Cluster on Windows then you can watch/listen to the replay at http://www.mysql.com/news-and-events/on-demand-webinars/display-od-517.html
Things to monitor on MySQL, the user’s perspective
Wed, 10 Mar 2010 09:12:24 +0000 - Working on mycheckpoint, I have the intention of adding custom monitoring. That is, letting the user define things to monitor. I have my own thoughts, I would be grateful to get more input! What would the user want to monitor? Monitoring for the number of SELECT statements per second, InnoDB locks, slave replication lag etc. is very important, and monitoring utilities provide with this information. But what does that tell the end user? Not much. The experienced DBA may gain a lot. The user would be more interested in completely other kind of information. In between, some information is relevant to both. Say we were managing an on-line store. We want to monitor the health of the database. But the health of the database is inseparable from the health of the application. I mean, having little to no disk usage is fine, unless… something is wrong with the application, which leads to no new purchases. And so a user would be interested in monitoring the number of purchases per hour, or the time passed since last successful purchase. This kind of data can only be generated by a user’s specific query. Looking at the charts, the user would then feel safer and confident in the wellness of his store app. But let’s dig further. We want the store’s website to provide with good response. In particular, the query which returns the items in a customer’s cart must react quickly. Our user would not only want to see that purchases get along, but also that page load times (as in our example) are quick for those critical parts. And so a user should be able to monitor the time it took to execute a given query. It can be of further interest to know how many times per second a given query is executed. This part is not easily done on the server side, and requires the user’s cooperation (or else we must analyze the general log, sniff, or set up a proxy). If the user is willing, she can log to some table each time she executes a certain query. Then we’re back to monitoring a regular table, as with the first example. It is also possible to monitor for a query’s execution plan. Is it full scan? How many rows are expected? But given that we can monitor the time it took to execute a query, I’m not sure this is useful. If everything runs fast enough — who cares about how it executes? Some of the above can be monitored on an altogether higher level: if  we’re talking about some web application, then we can use our Apache logs to determine load time for pages, or number of requests to our “cart items” page. But not always do we work with web servers, and we may be interested in checking the specific queries behind the scenes. Summary Custom monitoring can include: User defined queries (number of concurrent visitors; count of successful operations per second; number of rows per given table or condition; …) Execution time for user defined queries (time it takes to return cart items; find rows matching condition; sort a table; …) Number of executions for a given query, per second. I intend to incorporate the above into mycheckpoint as part of its standard monitoring scheme. Please share your thought below.
Its a cheat! Get Linux performance information from your MySQL database without shell access.
Wed, 10 Mar 2010 09:01:59 +0000 - System administrators familiar with the Linux operating system use the tools in the 'procps' toolset all the time. Tools which read from /proc include top, iostat, vmstat, sar and others. The files in /proc contain useful information about the performance of the system. Most of the files are documented in the Linux kernel documentation. You can also check man 5 proc.Most performance monitoring tools invoke other tools like iostat to collect performance information instead of reading from the /proc filesytem itself. This begs the question, what can you do if you don't have access to those tools? Perhaps you are using a hosted Linux database and have no access to the underlying shell to execute tools like iostat or top? How could you gather information about the performance of the actual system without being allowed to run the tools?MySQL includes a command called LOAD DATA INFILE which can read the contents of a delimited text file and store the contents into a database table. The contents of /proc are world readable, so your MySQL database should have access to this information as long as it is running on a Linux server. Lets start by collecting and reporting on some CPU performance information. CREATE TEMPORARY TABLE test.proc_stat ( seq tinyint auto_increment primary key, the_key char(25) NOT NULL, user bigint, nice bigint, system bigint, idle bigint, iowait bigint, irq bigint, softirq bigint, steal bigint, guest bigint, other bigint ); /* MySQL treats consecutive delimiters as separate fields, so some fancy footwork is required to load the file successfully. The file includes a cpu field followed by two spaces which is the sum of all the individual CPUs in the system. To account for this each row is read into some MySQL variables. Those variables are examined to determine which field holds the correct value. */ LOAD DATA INFILE '/proc/stat' IGNORE INTO TABLE test.proc_stat FIELDS TERMINATED BY ' ' (@the_key, @val1, @val2, @val3, @val4, @val5, @val6, @val7, @val8, @val9, @val10) SET other = IF(@the_key like 'cpu%', NULL, @val1), the_key = @the_key, user = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val1, 0), IFNULL(@val2,0))), nice = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val2, 0), IFNULL(@val3,0))), system = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val3, 0), IFNULL(@val4,0))), idle = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val4, 0), IFNULL(@val5,0))), iowait = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val5, 0), IFNULL(@val6,0))), irq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val6, 0), IFNULL(@val7,0))), softirq = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val7, 0), IFNULL(@val8,0))), steal = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val8, 0), IFNULL(@val9,0))), guest = IF(@the_key NOT LIKE 'cpu%', NULL, IF(@the_key != 'cpu', IFNULL(@val9, 0), IFNULL(@val10,0))); Depending on your kernel version you may get 1 or more warnings about unexpected numbers of columns. You can safely ignore these. mysql> select * from test.proc_stat; +-----+---------------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | seq | the_key | user | nice | system | idle | iowait | irq | softirq | steal | guest | other | +-----+---------------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ | 1 | cpu | 378340 | 33588 | 82489 | 1838257830 | 75444 | 750 | 23065 | 0 | 0 | NULL | | 2 | cpu0 | 4152 | 125 | 1613 | 114920899 | 624 | 0 | 869 | 0 | 0 | NULL | | 3 | cpu1 | 2182 | 78 | 1474 | 114924477 | 50 | 2 | 3 | 0 | 0 | NULL | | 4 | cpu2 | 6037 | 5418 | 2289 | 114914024 | 55 | 34 | 401 | 0 | 0 | NULL | | 5 | cpu3 | 3519 | 55 | 842 | 114923794 | 37 | 1 | 1 | 0 | 0 | NULL | | 6 | cpu4 | 71851 | 5443 | 6656 | 114840363 | 3197 | 11 | 720 | 0 | 0 | NULL | | 7 | cpu5 | 2435 | 5 | 801 | 114924963 | 29 | 2 | 0 | 0 | 0 | NULL | | 8 | cpu6 | 136246 | 4711 | 36628 | 114690032 | 46119 | 20 | 14471 | 0 | 0 | NULL | | 9 | cpu7 | 1119 | 2 | 366 | 114926691 | 40 | 1 | 0 | 0 | 0 | NULL | | 10 | cpu8 | 4126 | 34 | 2772 | 114920032 | 92 | 1 | 1153 | 0 | 0 | NULL | | 11 | cpu9 | 1618 | 2 | 694 | 114925811 | 77 | 1 | 0 | 0 | 0 | NULL | | 12 | cpu10 | 18096 | 8735 | 6823 | 114891588 | 396 | 179 | 2379 | 0 | 0 | NULL | | 13 | cpu11 | 7243 | 2583 | 3559 | 114914559 | 241 | 1 | 2 | 0 | 0 | NULL | | 14 | cpu12 | 5215 | 2380 | 2776 | 114915814 | 417 | 342 | 1237 | 0 | 0 | NULL | | 15 | cpu13 | 3224 | 28 | 1507 | 114923336 | 77 | 2 | 0 | 0 | 0 | NULL | | 16 | cpu14 | 109818 | 3979 | 13071 | 114775431 | 23901 | 143 | 1823 | 0 | 0 | NULL | | 17 | cpu15 | 1450 | 1 | 612 | 114926010 | 83 | 1 | 0 | 0 | 0 | NULL | | 18 | intr | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1176485951 | | 19 | ctxt | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 171220339 | | 20 | btime | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1267061074 | | 21 | processes | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 168510 | | 22 | procs_running | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 | | 23 | procs_blocked | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 0 | +-----+---------------+--------+-------+--------+------------+--------+------+---------+-------+-------+------------+ 23 rows in set (0.00 sec) Now that you know you can collect that information, then you can emulate top to calculate the current total CPU usage. I'll show you how to do that in my next blog post.
Do you need more data in the slow query log?
Wed, 10 Mar 2010 03:04:06 +0000 - Imagine you tried to use the slow query log to debug a performance problem. Does the current format have enough details? # Time: 100309 18:48:23 # User@Host: root[root] @ localhost [] # Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 1 I have added Thread_id, Errno, Start and End. Thread_id can be used to find similar data from SHOW PROCESSLIST and the binlog. Errno is useful in many cases. Start and End are there for convenience. Can you suggest anything else that would be easy to add? Note that Rows_sent and Rows_examined are always zero for insert, update and delete statements. Feature request 49756 is open to change that. Maybe that is easy to fix. # Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 1\ Thread_id: 3 Errno: 0 Start: 18:48:23 End: 18:48:23 Update, I found more data that is easy to add and the proposed output is: # Time: 100310 7:51:28 # User@Host: root[root] @ localhost [] # Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 0 Thread_id: 1 Errno: 0 \ Killed: 0 Bytes_received: 104 Bytes_sent: 161 Read_first: 0 Read_last: 0 Read_key: 0 \ Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 \ Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 \ Created_tmp_tables: 0 Start: 7:51:28 End: 7:51:28
Speaking at MySQL Conference: The Thinking Person's Guide to Data Warehouse Design
Wed, 10 Mar 2010 02:37:42 +0000 - I'll be presenting "The Thinking Person's Guide to Data Warehouse Design" at the upcoming MySQL User conference. While a lot of people think that bad SQL code is the #1 wrecking ball of data warehouses and marts, the fact is that poor database design is the first cause of both downtime and bad performance. In my presentation, I'll do my best to show how up-front worRead More...

SQL Server Hosting Toolkit The goal of the SQL Server Hosting Toolkit is to enable a great experience around SQL Server in shared hosting environments. The toolkit will eventually consist of a suite of tools and services that hosters can deploy for use by their customers. It will also serve as an incubation vehicle for tools that hosting customers can download and use directly, regardless of whether their hoster has deployed the toolkit. See the Project Roadmap for details on where we're going.

MySQL 5.0 Reference Manual

MySQL Installation Using a Source Distribution

Full-Text Search Functions

The Full-Text Stuff That We Didn't Put In The Manual MySQL Full Text Search in MySQL 5.1: New Features and How To

MySQL Full Text Search 

Text Stopwords The default list of full-text stop words.

MySQLMan is a web based database manager. It allows you to perform common maintenance and administration tasks in Mysql (Mysql is a great mostly-free SQL database server). MySQLMan was based off of phpMyAdmin, but written in Perl. It allows you to do common tasks like: browse/create/drop databases; browse/search/create/drop/alter tables; import/export data; add/remove/alter table; columns; add/remove/alter table keys, etc...

MySQL Forge Resources for the MySQL Community

My SQL Full Text Blogspot 

WAMP (Windows, Apache, MySQL & PHP). With WAMP installed, you can run a web server (and things like WordPress, MediaWiki, and Jinzora) on your Windows PC.  Read this How to install WAMP

JLBN Free WAMP Guides & Website Design Templates.

XAMPP is an easy to install Apache distribution containing MySQL, PHP and Perl. XAMPP is really very easy to install and to use - just download, extract and start. From Apache Friends

Toad for MySQL empowers MySQL developers and administrators develop code more efficiently. It also provides utilities to compare, extract and search for objects, manage projects, import/export data and administer the database. Toad for MySQL increases developer productivity and offers access to a solid community of experts and peers for interactive support.

Sphinx (SQL Phrase Index), Free open-source SQL full-text search engine. Provides fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism. Syphix Free open-source SQL full-text search engine. As we know build in full text search is currently limited only to MyISAM search engine as well as has few other limits. Today Sphinx Search plugin for MySQL was released which now provides fast and easy to use full text search solution for all storage engines. This version also adds a lot of other new features, including boolean search and distributed searching.

e107 is a content management system written in PHP and using the popular open source mySQL database system for content storage. It's completely free and totally customisable, and in constant development.

PhotoPost is written in highly optimized PHP code and uses a lightning fast MySQL database backend. PhotoPost uses either ImageMagick™ or the GD Graphics Library to resize uploaded images and create thumbnails. Chances are, your web host already has either ImageMagick or GD installed on your server, so be sure to check with them if you don't know if you have one or both installed. More Graphics, Graphics file formats Video and Images

phpBB is a high powered, fully scalable, and highly customizable Open Source bulletin board package. phpBB has a user-friendly interface, simple and straightforward administration panel, and helpful FAQ. Based on the powerful PHP server language and your choice of MySQL, MS-SQL, PostgreSQL or Access/ODBC database servers, phpBB is the ideal free community solution for all web sites.

Maatkit Tools for SQL. Makes MySQL easier and safer to manage. It provides simple, predictable ways to do things you cannot otherwise do. It would be nice if these features were included with MySQL, but they are not. That's why Maatkit is now shipping by default with many GNU/Linux distributions such as Debian and CentOS.  You can use Maatkit to prove replication is working correctly, fix corrupted data, automate repetitive tasks, speed up your servers, and much, much more. This is the older MySQL Toolkit. This toolkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking. A set of essential tools for MySQL users, developers and administrators. The project®s goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as Awk and Sed. slaves for data consistency, with emphasis on quality and scriptability.

Automatic MySQL Backup. Backup multiple MySQL databases with one script. (Now able to backup ALL databases on a server easily. no longer need to specify each database separately). Backup all databases to a single backup file or to a separate directory and file for each database. Automatically compress the backup files to save disk space using either gzip or bzip2 compression. Can backup remote MySQL servers to a central server. Runs automatically using cron or can be run manually. Can e-mail the backup log to any specified e-mail address instead of "root". (Great for hosted websites and databases).  Can email the compressed database backup files to the specified email address. Can specify maximum size backup to email. Can be set to run PRE and POST backup commands. Choose which day of the week to run weekly backups.

MySQL Dump Use MySQL Dump to backup your MySQL databases, both structure and data. The script can be run from command prompt only. So, you can use crontab or other system scheduler to fully automate your data backup process. The script browse all databases that you select and write SQL statements for creating tables and inserting data into files (one for each database). As an option the dumps may be compressed into zip archive. Finally output files stored into directory that you selected.

MySQL online help

Transfer Data from/to SQL Server, DB2, Sybase, MySQL and other DB's

Database-SQL-RDBMS HOW-TO Document for Linux (PostgreSQL Object Relational Database System)

O'Reilly net databases

Toad® for MySQL Freeware - Provides a comprehensive solution for MySQL professionals to create and execute queries, as well as build and manage database objects. You'll benefit from the project manager, the formatting feature, version control integration, the database browser, the security manager and an extensive knowledge base called Knowledge Xpert for MySQL.

Sql-Articles. This site is intended to produce articles related to sql server and its a free resource .All latest developments in the world of SQL Server will be available. Feel free to post your suggestion in the suggestion tab. You can also contribute any kind of Tips relating to SQL Server in the Tips tab. The sole purpose of starting this website is to create a knowledge base and help all the Newbies in SQL

MyAccess is an AddIn for MS Access which allows you to manage MySQL databases from within Access.

UtterAccess Discussion Forum  Microsoft® Access, Excel, Word, Outlook®, Visual Basic®, SQL Server®, Office online help discussion forums. And... many more!

Migrating from Microsoft Access to MySQL

MySQL Migrating from Microsoft SQL Server, Access, or another database to MySQL

Migration of Access data to MySQL - Tutorials - Webmaster Stop

Microsoft SQL Server Microsoft Server System

Migrating from Microsoft Access to MySQL

Microsoft SQL Server DBA Survival Guide

Access to MySQL Software

Official MySQL

Official MySQL Documentation

Official MySQL Manual

MySQL FAQ

MySQL Storages Engines:-MySQL Native Storage Engines MySQL currently offers a number of its own native Storage Engines, including:-

Partner-Developed Storage Engines. MySQL Partners are actively developing Storage Engines that are optimized for specific application domains:-

Community-Developed Storage Engines. MySQL's community of open source developers are also developing Storage Engines that are optimized for specific application domains.:-

Custom Storage Engines. MySQL's Customers are also developing customized in-house Storage Engines to address their specific needs:-

For more information about the MySQL Storage Engine Partner Program, please contact MySQL. MySQL - InnoDB vs MyISAM

MySQL migration: MyISAM to InnoDB     Restrictions on InnoDB Tables Warning: Do not convert MySQL system tables in the mysql database from MyISAM to InnoDB tables! This is an unsupported operation. If you do this, MySQL does not restart until you restore the old system tables from a backup or re-generate them with the mysql_install_db script.

MySQL Gotchas from SQL-info.de

Build Your Own Database Driven Website using MySQL  (PHP)

Setting up a MySQL Based Website (using Perl)

MySQL Tutorial (PHP)

MySQL Basics -- A Helpful MySQL Tutorial

Complete List Of MySQL Related (PHP) Commands

MySQL Connector/ODBC (also known as MyODBC) allows you to connect to a MySQL database server using the ODBC database API on all Microsoft Windows and most Unix platforms, including through such applications and programming environments such as Microsoft Access, Microsoft Excel, and Borland Delphi.

Microsoft SQL Server Microsoft Server System

Microsoft SQL Server DBA Survival Guide

My Database Support. Oracle, MySQL, SQL Server Database Support. 

Scalable BLOB Streaming infrastructure for MySQL will transform MySQL into a scalable media server capable of streaming pictures, films, MP3 files and other binary and text objects (BLOBs) directly in and out of the database. On this site you will find all information relating to the ongoing activities of this project. The development is led by PrimeBase Technologies, Prime Base an open source software company. Blob=(Binary Large Object) is a that is store the data as binary. Other types of data used in a databases, for example numbers and strings, which store letters and numbers blobs can be used to store images or other multimedia files because of the binary type used. Note: They may often use more storage than other data types.

MySQL BLOB Types

Protecting Your PHP/MySQL Queries from SQL Injection SQL injection is a serious concern for webmasters, as an experienced attacker can use this hacking technique to gain access to sensitive data and/or potentially cripple your database. If you haven't secured your applications, I implore you to get yourself familiar with the following method and grind it into your coding routine. One unsafe query can result in a nightmare for you or your client.

SQLServerCentral A Microsoft SQL Server community of DBAs, developers and SQL Server users

Google Videos MySQL

Ocelot The Standard SQL DBMS

Informix Dynamic Server (IDS)

Apache open-source software and Apache Servers. Mod Rewrite.

SQL Interpreter

Online SQL Formatter

SQL Code beautifier

SQL-info.de MySQL

PERL DBI

PostgreSQL Main Site

PostgreSQL, Inc.

Practical PostgreSQL Book

PostgreSQL Technical Documentation

PostgreSQL Gotchas

Mini SQL: A Lightweight Database Server

SQLite is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine.

Optimize Your MySQL Databases using cPanel and phpMyAdmin

SQL Server Performance

MySQL Performance Blog:-

Failed to Get RSS Data

An error was ecnountered attempting to get the RSS data: The server did not return XML. The content type returned was text/html; charset=UTF-8

32 Tips To Speed Up Your MySQL Queries

MySQL Web Seminars. News and events. On demand webinars. Discover more about the Structured Query Language

MySQL Performance Tuning :-

Everita : MySQL Performance Tuning

Green Computing, Data Centre Efficiency, Virtualization And The Cloud
Wed, 06 Jan 2010 15:09:22 +0000 -

Green Computing, Data Centre Efficiency, Virtualization And The Cloud

There's been a lot of talk of late about green computing, how data centres are becoming more efficient, virtualization and particularly the idea that cloud computing is some sort of panacea. Yes, that maybe so. But you rarely hear anyone mention inefficient software being taken to task: it's the elephant in the room.

read more

Serve Static Drupal Content Faster With Boost And Nginx
Wed, 23 Dec 2009 11:09:01 +0000 -

Serve Static Drupal Content Faster With Boost And Nginx

By Stephen Jayna, 23rd December 2009

read more

How To Speed Up Drupal and/or PHP With XCache
Mon, 21 Dec 2009 15:00:33 +0000 -

How To Speed Up Drupal and/or PHP With XCache

By Stephen Jayna, 22nd December 2009

Doubtless most of you do this already, but if you don't you probably should at least consider it: install XCache. If you serve pages from Drupal or moreover with PHP you could, as I have, increase your PHP throughput by 167% for five minutes of effort.

read more

How To Reduce table_locks_waited In MySQL/MyISAM
Wed, 19 Aug 2009 10:38:13 +0000 -

How To Reduce table_locks_waited In MySQL/MyISAM

By Stephen Jayna, 19th August 2009

The scourge of parallelism and scaling everywhere: locking. Or in MySQL/MyISAM — and to be more precise — table locks. Here's an overview of what to look out for and how one might go about reducing the frequency at which they occur.

read more

How To Speed Up MySQL: An Introduction To Optimizing
Sun, 02 Aug 2009 22:26:42 +0000 -

How To Speed Up MySQL: An Introduction To Optimizing

By Stephen Jayna, 3rd August 2009

Although there is nothing groundbreaking in this document consider it a bringing together of techniques for your first foray into optimization. We won't discuss the more esoteric methods of squeezing the very last millisecond out of MySQL. There are a myriad of parameters to tune: here's what you need to get right first.

read more

dbforums Database Forums Covers most types of Database form Adabas to XML & XSLT and More...

MySQL User Defined Functions.  Tutorial on writing your own MySQL User Defines Functions

DB2 Universal Database, IBM, for Linux, UNIX and Windows

Low-Cost Unix Database Differences (1999-08-15)

Oracle Corporation

Oracle FAQ - Home

Oracle O'Reilly

International Oracle Users Group

ITtoolbox Oracle Knowledge Base

SQL Converter. Makes Databases Easy. Convert Excel to SQL in Minutes. SQL Converter 2 for Excel makes databases easy. Start with your familiar Excel spreadsheet and it will generate a MySQL database. Given any file that Excel can read, SQL Converter for Excel will identify the header row and let you select the best data-type for each column. For most files, you can have a MySQL database within three minutes.

Wikipedia: MySQL (free encyclopedia.)

Device Tools is a comprehensive and free portal, aimed at providing engineers who develop connected devices all the information needed to make their next design a success. Covers MySQL Databases and low level and high level coding.

LAMP is an acronym for a set of free software programs commonly used together to run dynamic Web sites: Linux, the operating system; Apache, the web server; MySQL, the database management system (or database server); Perl, PHP, and/or Python, scripting languages

Host Library Tutorial is designed to guide you through the initial steps of setting up Apache, MySQL, and PHP on Linux.

LAMP Tutorial: Linux, Apache, MySQL, PHP Introduction.

On LAMP O'Rielly

Linux Apache MySQL PHP Web Sites

Linux SQL Databases and Tools

Senna is an embeddable fulltext search engine, which you can use in conjunction with various scripting languages and databases. Senna is an inverted index based engine, and combines the best of n-gram indexing and word indexing to achieve fast, precise searches. While senna codebase is rather compact it is scalable enough to handle large amounts of data and queries.

SPARQL (pronounced "SPARkLe"). SPARQL is the query language for the Semantic Web (see Semantic Web use cases). SPARQL queries hide the details of data management, which lowers costs and increases robustness of data integration on the Web. SPARQL Query Language for RDF, SPARQL Protocol for RDF, and SPARQL Query Results XML Format.   More XML, Extensible Markup Language MySQL backup,compress and FTP from WinForms app. From your Windows Forms application, implement two menu commands a) backup entire MySQL database, compress it and send it with FTP to a ftp server b) reverse = fetch from FTP server, uncompress, restore to MySQL

SiteVault functions as an FTP files and MySql backup program that will allow you to browse your backups and restore them with ease. It will be an excellent FTP program that will allow you to maintain as many connections as you wish, copy between FTP servers, edit files remotely. It will do as many transfers as you need simultaneously. It will also double as an awesome file manager and computer explorer, it's network browsing being the fastest we've seen so far. It'll help you keep your sites safe, your backups clean and your business running. The program is meant for persons or organizations running one or multiple sites or web developers that need to have their work safe.

SQL Team:-

The SQLTeam.com Weblogs

SQL Server thoughts, code and musings.

ClearTrace Supports Statement Level Events
Fri, 12 Mar 2010 13:26:42 GMT -

One of the requests I get on a regular basis is to capture the performance of statement level events.  The latest beta has this feature available.  If you’re interested in this I’d like to get some feedback.

I also fixed an annoying bug where ClearTrace would fail and tell you a value had already been added.  This is a result of the collection I use being case-sensitive and SQL Server not being case-sensitive.  I thought I had properly coded around that but finally realized I hadn’t.  It should be fixed now.

If you have any questions or problems the ClearTrace support forum is the best place for those.

Blog...Dead & New Address
Thu, 11 Mar 2010 18:38:20 GMT -

Hi Folks!

For a while I was attempting (probably the correct word) to keep both this blog as well as my SQL Server Magazine SQL Server BI blog current. Working on growing B.I. Voyage , speaking, writing, and blogging is quite the work load. And so this blog suffered as a result. THANK YOU all who read my blog here on SQLTeam.com, I intend to leave it live for those who might benefit from its content at a later date. My consolidated, single blog can be found at http://www.sqlmag.com/blogs/sqlserverbi.aspx .

Cheers,

Derek

SQL MDS - Updating the Name attribute of member using Staging Table
Thu, 11 Mar 2010 08:20:40 GMT -

Creating member is usually done by populating the Member Staging Table (tblStgMember), during this process you assign a value for member code and member name. Now if you want to update the member name attribute you can do this by adding record in Attribute staging table (tblStgMemberAttribute) with Attribute Name = "Name". If you try populating the tblStgMember table it will say that the member code already exists.

 

INSERT INTO mdm.tblStgMemberAttribute (ModelName, EntityName, MemberType_ID, MemberCode, AttributeName, AttributeValue) VALUES

(N'Product', N'Product', 1, N'BK-M101', N'Name',N'Updated Member Name Description')

SQLSaturday 33 Observations
Tue, 09 Mar 2010 16:59:22 GMT -

Along with a lot of my colleagues, I went to SQLSaturday #33 in Charlotte this last weekend.  Overall a really good event, especially for a first-time organizer.  There is some controversy over certain events where my name got mentioned so I thought I would clear the air.

Before I get to the core controversy, let's get the details out of the way. 

The Microsoft Offices in Charlotte were an excellent venue for this event.  I really appreciated the Microsoft employees that helped out by letting us in and out of normally secure areas.  This is definitely above and beyond on their part.

Thanks to the organizers (especially Greg and Peter) for the great hospitality they showed to the speakers. 

Now for the specifics.  Like most events of this type, there was a raffle at the end for some cool swag.  As a speaker I got raffle tickets just like any other attendee.  The raffle was clearly promoted as "must be present to win".  The problem is that for various reasons, the raffle kicked off immediately after the last speaker finished in the largest room.  That room was across the parking lot from all the other rooms for the event.  I happened to have one of the last sessions of the day, and not in the main room.  I also ran long since the audience was very interactive and there were a lot of follow-up questions.  (BTW, thanks to everyone who came and stayed for my session.  Sorry it cost you the chance to win too.).  My name was drawn for an very nice piece of swag (iPod Touch if you insist).  Since I wasn't there, I didn't win. Several folks mentioned I was still speaking and was "here" (as in at the event) just not "here in the room".

Yes, I was mad when I found out about it. I think that was handled poorly.  I personally lost out as did my audience (dunno if anyone specific lost anything, but it is the idea that counts).  It was a mistake.

Mistakes happen.  Nobody acted maliciously.  Heck, the guys running the event who made the decision are my friends and remain so.  I got over my mad.  We talked about this privately and we are all OK with what happened.  I am not going to let a gadget get in the way of a couple of good friendships.

I think the mistake was mostly due to a lack of unity between the venue buildings   Pam Shaw had a similar challenge in Tampa a few weeks ago, including a speaker who ran long on the last session (not me that time).  She had a couple of teenage volunteers to act as gofers/runners.  They counted heads in sessions, pointed people to last-minute room and session changes, and generally helped connect the organizers to what was actually happening.  Note that this was not Pam's first SQLSaturday event.  She knew but the knowledge had not been institutionalized.  We (The SQL community in general and SQLSaturday organizers in particular) now know how essential gofers are to success.

I know I spent most of this post focusing on the controversy, but I wanted to clear everything up.  I don't want to let a minor mistake, made in good faith, overshadow what was a tremendously good event for the community.

As for the iPod Touch, someone in the SQL community is enjoying it, so it is not a total loss.  And if losing out on it is the price I pay so we can learn this, then that is what a community leader does.  Consider it a gift.  Besides, I really wanted a Zune 120 :)

 

Fixed Bid vs. T&M – Take 2
Tue, 09 Mar 2010 06:08:31 GMT -

One of my most popular blog entries of all time is my Contracting Tips: Fixed Bid vs. T&M post from January, 2004.  This post consistently shows up in my referrers list, usually coming from a search engine.  Recently, Brent Ozar (@BrentO) wrote a great argument for why he always bills by the hour (a.k.a. Time & Materials or T&M) which itself was a response to Mark Richman’s (@mrichman) post on why he never bills by the hour (fixed bid).  Each article has good arguments, and I encourage you to read them both and choose the best approach for you.

As for me, my experience parallels Brent’s and I historically have leaned toward the Time & Materials model.

Troubleshooting Application Timeouts in SQL Server
Tue, 09 Mar 2010 01:44:08 GMT -

I recently received the following email from a blog reader:

"We are having an OLTP database instance, using SQL Server 2005 with little to moderate traffic (10-20 requests/min). There are also bulk imports that occur at regular intervals in this DB and the import duration ranges between 10secs to 1 min, depending on the data size. Intermittently (2-3 times in a week), we face an issue, where queries get timed out (default of 30 secs set in application). On analyzing, we found two stored procedures, having queries with multiple table joins inside them of taking a long time (5-10 mins) in getting executed, when ideally the execution duration ranges between 5-10 secs. Execution plan of the same displayed Clustered Index Scan happening instead of Clustered Index Seek. All required Indexes are found to be present and Index fragmentation is also minimal as we Rebuild Indexes regularly alongwith Updating Statistics. With no other alternate options occuring to us, we restarted SQL server and thereafter the performance was back on track. But sometimes it was still giving timeout errors for some hits and so we also restarted IIS and that stopped the problem as of now."

Rather than respond directly to the blog reader, I thought it would be more interesting to share my thoughts on this issue in a blog.

There are a few things that I can think of that could cause abnormal timeouts:

  1. Blocking
  2. Bad plan in cache
  3. Outdated statistics
  4. Hardware bottleneck

To determine if blocking is the issue, we can easily run sp_who/sp_who2 or a query directly on sysprocesses (select * from master..sysprocesses where blocking <> 0).  If blocking is present and consistent, then you'll need to determine whether or not to kill the parent blocking process.  Killing a process will cause the transaction to rollback, so you need to proceed with caution.  Killing the parent blocking process is only a temporary solution, so you'll need to do more thorough analysis to figure out why the blocking was present.  You should look into missing indexes and perhaps consider changing the database's isolation level to READ_COMMITTED_SNAPSHOT.

The blog reader mentions that the execution plan shows a clustered index scan when a clustered index seek is normal for the stored procedure.  A clustered index scan might have been chosen either because that is what is in cache already or because of out of date statistics.  The blog reader mentions that bulk imports occur at regular intervals, so outdated statistics is definitely something that could cause this issue.  The blog reader may need to update statistics after imports are done if the imports are changing a lot of data (greater than 10%).  If the statistics are good, then the query optimizer might have chosen to scan rather than seek in a previous execution because the scan was determined to be less costly due to the value of an input parameter.  If this parameter value is rare, then its execution plan in cache is what we call a bad plan.  You want the best plan in cache for the most frequent parameter values.  If a bad plan is a recurring problem on your system, then you should consider rewriting the stored procedure.  You might want to break up the code into multiple stored procedures so that each can have a different execution plan in cache.

To remove a bad plan from cache, you can recompile the stored procedure.  An alternative method is to run DBCC FREEPROCACHE which drops the procedure cache.  It is better to recompile stored procedures rather than dropping the procedure cache as dropping the procedure cache affects all plans in cache rather than just the ones that were bad, so there will be a temporary performance penalty until the plans are loaded into cache again.

To determine if there is a hardware bottleneck occurring such as slow I/O or high CPU utilization, you will need to run Performance Monitor on the database server.  Hopefully you already have a baseline of the server so you know what is normal and what is not.  Be on the lookout for I/O requests taking longer than 12 milliseconds and CPU utilization over 90%.  The servers that I support typically are under 30% CPU utilization, but your baseline could be higher and be within a normal range.

If restarting the SQL Server service fixes the problem, then the problem was most likely due to blocking or a bad plan in the procedure cache.  Rather than restarting the SQL Server service, which causes downtime, the blog reader should instead analyze the above mentioned things.  Proceed with caution when restarting the SQL Server service as all transactions that have not completed will be rolled back at startup.  This crash recovery process could take longer than normal if there was a long-running transaction running when the service was stopped.  Until the crash recovery process is completed on the database, it is unavailable to your applications.

If restarting IIS fixes the problem, then the problem might not have been inside SQL Server.  Prior to taking this step, you should do analysis of the above mentioned things.

If you can think of other reasons why the blog reader is facing this issue a few times a week, I'd love to hear your thoughts via a blog comment.

My first encounter with SmartAssembly
Wed, 03 Mar 2010 10:11:15 GMT -

Let me start by writing I am a supreme VB6 programmer, but I have very little experience with VB.Net, so I think I still need some more time learning SmartAssembly.

SmartAssembly make obfuscating and merging dll files a piece of cake! With it's simple, straight forward and clean GUI I did make my tests work. With other obfuscators like Xenocode, Salamander etc which lets you (and in some cases forces you) control more advanced settings, you really have to know what you are doing.
Especially when it comes to protecting code that uses external dependencies.

My most annoying experience
is that if you start checking radio buttons and activating different obfuscating features in SmartAssembly, you will end up breaking your working code as well, if you like me is not that experienced and don't know what you’re doing.

SmartAssembly have some troubleshooting information on their website which explains why the application will fail in some scenarios. So why not extend these checks in some deeper analyzing stage on the dll's?
By doing that I think more people could get fully functional dll's out of the box instead of trying different settings and then test the protected dll and see if it's working or not.

//Peter

Making Money from your SQL Server Blog
Mon, 01 Mar 2010 14:47:50 GMT -

My SQL Server blog reading list is around one hundred blogs.  Many people are writing great content and generating lots of page views.  I see some of them running Google AdSense and trying to make a little money off their traffic.  If you want to earn some some extra money from what you’ve written there are a couple of options.  And one new option that I’m announcing here.

Background

Internet advertising is sold based on a few different pricing schemes. 

Darren Rowse at ProBlogger has been writing about blogging and making money off blogs for years.  He has a good introduction to making money on your blog in his “Making Money” section.  If you’re interested in learning more he has a post up titled How to Make More Money From Your Blog in the New Year that links to many of his best posts on the subject.

Google AdSense

GoogleThis is the most common method for people earning money from their blogging.  It’s easy to setup and administer.  You tell AdSense what size ads you’d like to run and it gives you a little piece of JavaScript to put on your site.  AdSense quickly learns the topics you write about and displays ads that are appropriate for your site.  I typically see ads for hosting, SQL Server tools and developer tools running in AdSense slots.  AdSense pays on a CPC model.  If you translate that back to CPM pricing you’ll see rates from $0.50 to $1.00 CPM.

Amazon

Amazon.com Associates CentralWhile you might not make much money writing books it’s now possible to make even less helping Amazon sell them.  You can sign up for an Amazon affiliate program.  Each time you send Amazon a link and someone buys the book you get a cut of that sale.  This is the CPA model from above. 

Amazon can help you build some pretty nice “stores”.  Here’s the SQL Server bookstore I built for SQLTeam.com.  If you’re just putting in a page with books like I’ve done on SQLTeam you should keep your expectations low.  If you’re writing book reviews of suggesting books on your blog it really does make sense to setup an Amazon affiliate link.  People are much more likely to buy a book based on a review from a trusted source.  I always try to buy through a referral link if there is one.

Amazon pays about 4% of the price as a referral fee.  You also get credit for anything else they buy while on the site.  I recently had someone buy an iPod nano with their SQL Server book making me an extra $5.60 richer!  Estimating how much you can make is difficult though.  How much attention you draw to the links and book reviews can dramatically affect the earnings.

Private Ad Sales

This is the hardest but potentially most lucrative option.  You sell advertising directly to companies that want to sell things to your readers.  Typically this would be SQL Server tool vendors, hosting companies or anyone else that wants to make money off database administrators.  This is also the most difficult to do.  You’ll need the contacts at the companies and enough page views to make it worth their while.  You’ll also need software to track the page views and clicks, geo-target your ads and smooth out the impressions.  Your earnings are based on whatever you can negotiate with the companies.

SQL Server Ad Network

SQLTeam.com LogoFor the last couple of years I’ve run any extra ads that I sold on the SQLTeam Weblogs.  You can see an example of that on Mladen’s blog.  The ad in the upper right corner is one that I’m running for him.  (Note: Many of the ads I’m running are geo-targeted to only appear in English speaking countries.  You may see a different set of ads outside the US, Canada and the UK.  You can also see he has a couple of Google ads on his blog.)  When I run ads on his blog I split the advertising revenue with him.  They make a little and I make a little.

I recently started to expand this and sell advertising specifically to run on SQL Server-related blogs.  I’m also starting to run ads on non-SQLTeam blogs.  The only way I can sell more advertising is to have more blogs to run it on.  And that’s where you come in.

I’ve created a SQL Server advertising network.  I handle all the ad sales and provide the technology to serve the ads.  I handle collections and payments back to you.  You get paid at the end of each month regardless of when (or if) the advertiser actually pays.  All you need to do is add a small piece of JavaScript to your site to display the ads.

If you’re writing about SQL Server and interested in earning a little money for your site I’d like to talk to you.  You can use the Contact Us page on SQLTeam.com to reach me.  Running advertising on your blog isn’t for everyone.  If you’re concerned about what advertisers might think about certain posts then you might not be a good fit.  For the most part this isn’t an issue.  You’ll also need to have a PayPal account to receive payments.  You probably won’t get rich doing this.  But you can earn extra cash on the side for doing what you would do anyway.  I do know that people have earned enough to buy themselves a nice laptop doing this.

My initial target is blogs with more than 10,000 page views per month.  I expect to pay two to three times what Google pays.  If you have less than 10,000 page views per month but are still interested I’d still like to hear from you.  I may not be able to sign up smaller blogs right away but we’ll get the process started.  If you’re unsure about your traffic Google Analytics is a free tool that provides great reporting on traffic, popular posts and how people find your blog.  If you have any questions or are just curious drop me a line and I’ll try to answer your questions.

Lessons from a SAN Failure
Fri, 26 Feb 2010 14:29:31 GMT -

At 1:10AM Sunday morning the main SAN at one of my clients suffered a “partial” failure.  Partial means that the SAN was still online and functioning but the LUNs attached to our two main SQL Servers “failed”.  Failed means that SQL Server wouldn’t start and the MDF and LDF files mostly showed a zero file size.  But they were online and responding and most other LUNs were available. 

I’m not sure how SANs know to fail at 1AM on a Saturday night but they seem to.  From a personal standpoint this worked out poorly: I was out with friends and after more than a few drinks.  From a work standpoint this was about the best time to fail you could imagine.  Everything was running well before Monday morning.  But it was a long, long Sunday.  I started tipsy, got tired and ended up hung over later in the day. Note to self: Try not to go out drinking right before the SAN fails.

This caught us at an interesting time.  We’re in the process of migrating to an entirely new set of servers so some things were partially moved.  This made it difficult to follow our procedures as cleanly as we’d like.  The benefit was that we had much better documentation of everything on the server.  I would encourage everyone to really think through the process of implementing your DR plan and document as much as possible.  Following a checklist is much easier than trying to remember at night under pressure in a hurry after a few drinks.

I had a series of estimates on how long things would take.  They were accurate for any single server failure.  They weren’t accurate for a SAN failure that took two servers down.  This wasn’t bad but we should have communicated better.

Don’t forget how many things are outside the database.  Logins, linked servers, DTS packages (yikes!), jobs, service broker, DTC (especially DTC), database triggers and any objects in the master database are all things you need backed up.  We’d done a decent job on this and didn’t find significant problems here.  That said this still took a lot of time.  There were many annoyances as a result of this.  Small settings like a login’s default database had a big impact on whether an application could run.  This is probably the single biggest area of concern when looking to recreate a server.  I’d encourage everyone to go through every single node of SSMS and look for user created objects or settings outside the database.

Script out your logins with the proper SID and already encrypted passwords and keep it updated.  This makes life so much easier.  I used an approach based on KB246133 that worked well.  I’ll get my scripts posted over the next few days.

The disaster can cause your DR process to fail in unexpected ways.  We have a job that scripts out all logins and role memberships and writes it to a file.  This runs on the DR server and pulls from the production server.  Upon opening the file I found that the contents were a “server not found” error.  Fortunately we had other copies and didn’t need to try and restore the master database.  This now runs on the production server and pushes the script to the DR site.  Soon we’ll get it pushed to our version control software.

One of the biggest challenges is keeping your DR resources up to date.  Any server change (new linked server, new SQL Server Agent job, etc.) means that your DR plan (and scripts) is out of date.  It helps to automate the generation of these resources if possible.

Take time now to test your database restore process.  We test ours quarterly.  If you have a large database I’d also encourage you to invest in a compressed backup solution.  Restoring backups was the single larger consumer of time during our recovery. And yes, there’s a database mirroring solution planned in our new architecture.

I didn’t have much involvement in things outside SQL Server but this caused many, many things to change in our environment.  Many applications today aren’t just executables or web sites.  They are a combination of those plus network infrastructure, reports, network ports, IP addresses, DTS and SSIS packages, batch systems and many other things.  These all needed a little bit of attention to make sure they were functioning properly.

Profiler turned out to be a handy tool.  I started a trace for failed logins and kept that running.  That let me fix a number of problems before people were able to report them.  I also ran traces to capture exceptions.  This helped identify problems with linked servers.

Overall the thing that gave me the most problem was linked servers.  In order for a linked server to function properly you need to be pointed to the right server, have the proper login information, have the network routes available and have MSDTC configured properly.  We have a lot of linked servers and this created many failure points.  Some of the older linked servers used IP addresses and not DNS names.  This meant we had to go in and touch all those linked servers when the servers moved.

MacGyver Moments
Fri, 26 Feb 2010 02:50:34 GMT -

Denny Cherry tagged me to write about my best MacGyver Moment.  Usually I ignore blogosphere fluff and just use this space to write what I think is important.  However, #MVP10 just ended and I have a stronger sense of community.  Besides, where else would I mention my second best Macgyver moment was making a BIOS jumper out of a soda can.  Aluminum is conductive and I didn't have any real jumpers lying around.

My best moment is probably my entire home computer network. 

Every system but one is hand-built, usually cobbled together out of spare parts and 'adapted' from its original purpose.

My Primary Domain Controller is a Dell 2300.   The Service Tag indicates it was shipped to the original owner in 1999.  Box has a PERC/1 RAID controller.  I acquired this from a previous employer for $50.  It runs Windows Server 2003 Enterprise Edition.  Does DNS, DHCP, and RADIUS services as a bonus.  RADIUS authentication is used for VPN and Wireless access.  It is nice to sign in once and be done with it.

The Secondary Domain Controller is an old desktop.  Dual P-III 933 with some extra drives.

My VPN box is a P-II 250 with 384MB of RAM and a 21 GB hard drive.  I did a P-to-V to my Hyper-V box a year or so ago and retired the hardware again.  Dynamic DNS lets me connect no matter how often Comcast shuffles my IP.

The Hyper-V box is a desktop system with 8GB RAM and an AMD Athlon 5000+ processor.  Cost me less than $500 to put together nearly two years ago.  I reasoned that if Vista and Windows 2008 were the same code then Vista 64-bit certified meant the drivers for Vista would load into Windows 2008.  Turns out I was right.

Later I added three 1TB drives but wasn't too happy with how that turned out.  I recovered two of the drives yesterday and am building an iSCSI storage unit. (Much thanks to Starwind.  Great product).  I am using an old AMD 1.1GhZ box with 1.5 GB RAM (cobbled together from three old PCs) as my storave server. 

The Hyper-V box is slated for an OS rebuild to 2008 R2 once I get the storage system worked out.  maybe in a week or two.

A couple of DLink Gigabit switches ties everything together.

Add in the Vonage box, the three PCs, the Wireless-N Access Point, the two notebooks and the XBox and you have gone from MacGyver to darn near Rube Goldberg.

The only thing I really spend money on is power supplies and fans.  I buy top-of-the-line for both.

I even pull and crimp my own cables.

Oh, and if my kids hose up a PC, I have all of their data on a server elsewhere.  Every PC and laptop is pretty much interchangable for email and basic workstation tasks.  That helps a lot too.

Of course I will tag SQLVariant.

 

WiX 3 Tutorial: Generating file/directory fragments with Heat.exe
Tue, 23 Feb 2010 12:34:02 GMT -

In previous posts I’ve shown you our SuperForm test application solution structure and how the main wxs and wxi include file look like. In this post I’ll show you how to automate inclusion of files to install into your build process. For our SuperForm application we have a single exe to install. But in the real world we have 10s or 100s of different files from dll’s to resource files like pictures. It all depends on what kind of application you’re building. Writing a directory structure for so many files by hand is out of the question. What we need is an automated way to create this structure. Enter Heat.exe.

Heat is a command line utility to harvest a file, directory, Visual Studio project, IIS website or performance counters. You might ask what harvesting means? Harvesting is converting a source (file, directory, …) into a component structure saved in a WiX fragment (a wxs) file.

There are 2 options you can use:

  1. Create a static wxs fragment with Heat and include it in your project. The pro of this is that you can add or remove components by hand. The con is that you have to do the pro part by hand. Automation always beats manual labor.
  2. Run heat command line utility in a pre-build event of your WiX project. I prefer this way. By always recreating the whole fragment you don’t have to worry about missing any new files you add. The con of this is that you’ll include files that you otherwise might not want to.

There is no perfect solution so pick one and deal with it. I prefer using the second way. A neat way of overcoming the con of the second option is to have a post-build event on your main application project (SuperForm.MainApp in our case) to copy the files needed to be installed in a special location and have the Heat.exe read them from there. I haven’t set this up for this tutorial and I’m simply including all files from the default SuperForm.MainApp \bin directory.

Remember how we created a System Environment variable called SuperFormFilesDir? This is where we’ll use it for the first time. The command line text that you have to put into the pre-build event of your WiX project looks like this:

"$(WIX)bin\heat.exe" dir "$(SuperFormFilesDir)" -cg SuperFormFiles -gg -scom -sreg -sfrag -srd -dr INSTALLLOCATION -var env.SuperFormFilesDir -out "$(ProjectDir)Fragments\FilesFragment.wxs"

After you install WiX you’ll get the WIX environment variable. In the pre/post-build events environment variables are referenced like this: $(WIX). By using this you don’t have to think about the installation path of the WiX. Remember: for 32 bit applications Program files folder is named differently between 32 and 64 bit systems. $(ProjectDir) is obviously the path to your project and is a Visual Studio built in variable.

You can view all Heat.exe options by running it without parameters but I’ll explain some that stick out the most.

  1. dir "$(SuperFormFilesDir)": tell Heat to harvest the whole directory at the set location. That is the location we’ve set in our System Environment variable.
  2. –cg SuperFormFiles: the name of the Component group that will be created. This name is included in out Feature tag as is seen in the previous post.
  3. -dr INSTALLLOCATION: the directory reference this fragment will fall under. You can see the top level directory structure in the previous post.
  4. -var env.SuperFormFilesDir: the name of the variable that will replace the SourceDir text that would otherwise appear in the fragment file.
  5. -out "$(ProjectDir)Fragments\FilesFragment.wxs": the full path and name under which the fragment file will be saved.

If you have source control you have to include the FilesFragment.wxs into your project but remove its source control binding. The auto generated FilesFragment.wxs for our test app looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
<Fragment>
<ComponentGroup Id="SuperFormFiles">
<ComponentRef Id="cmp5BB40DB822CAA7C5295227894A07502E" />
<ComponentRef Id="cmpCFD331F5E0E471FC42A1334A1098E144" />
<ComponentRef Id="cmp4614DD03D8974B7C1FC39E7B82F19574" />
<ComponentRef Id="cmpDF166522884E2454382277128BD866EC" />
</ComponentGroup>
</Fragment>
<Fragment>
<DirectoryRef Id="INSTALLLOCATION">
<Component Id="cmp5BB40DB822CAA7C5295227894A07502E" Guid="{117E3352-2F0C-4E19-AD96-03D354751B8D}">
<File Id="filDCA561ABF8964292B6BC0D0726E8EFAD" KeyPath="yes" Source="$(env.SuperFormFilesDir)\SuperForm.MainApp.exe" />
</Component>
<Component Id="cmpCFD331F5E0E471FC42A1334A1098E144" Guid="{369A2347-97DD-45CA-A4D1-62BB706EA329}">
<File Id="filA9BE65B2AB60F3CE41105364EDE33D27" KeyPath="yes" Source="$(env.SuperFormFilesDir)\SuperForm.MainApp.pdb" />
</Component>
<Component Id="cmp4614DD03D8974B7C1FC39E7B82F19574" Guid="{3443EBE2-168F-4380-BC41-26D71A0DB1C7}">
<File Id="fil5102E75B91F3DAFA6F70DA57F4C126ED" KeyPath="yes" Source="$(env.SuperFormFilesDir)\SuperForm.MainApp.vshost.exe" />
</Component>
<Component Id="cmpDF166522884E2454382277128BD866EC" Guid="{0C0F3D18-56EB-41FE-B0BD-FD2C131572DB}">
<File Id="filF7CA5083B4997E1DEC435554423E675C" KeyPath="yes" Source="$(env.SuperFormFilesDir)\SuperForm.MainApp.vshost.exe.manifest" />
</Component>
</DirectoryRef>
</Fragment>
</Wix>

The $(env.SuperFormFilesDir) will be replaced at build time with the directory where the files to be installed are located. There is nothing too complicated about this. In the end it turns out that this sort of automation is great!

There are a few other ways that Heat.exe can compose the wxs file but this is the one I prefer. It just seems the clearest. Play with its options to see what can it do. It’s one awesome little tool.

 


WiX 3 tutorial by Mladen Prajdić navigation

Clustering for Mere Mortals (Pt2)
Thu, 18 Feb 2010 21:27:48 GMT -

Planning.

I could stop there and let that be the entirety post #2 in this series.  Planning is the single most important element in building a cluster and the Laptop Demo Cluster is no exception.  One of the more awkward parts of actually creating a cluster is coordinating information between Windows Clustering and SQL Clustering.  The dialog boxes show up hours apart, but still have to have matching and consistent information.

Excel seems to be a good tool for tracking these settings.  My workbook has four pages: Systems, Storage, Network, and Service Accounts.  The systems page looks like this:

 

Name Role Software Location
East Physical Cluster Node 1 Windows Server 2008 R2 Enterprise Laptop VM
West Physical Cluster Node 2 Windows Server 2008 R2 Enterprise Laptop VM
North Physical Cluster Node 3 (Future Reserved) Windows Server 2008 R2 Enterprise Laptop VM
MicroCluster Cluster Management Interface N/A Laptop VM
SQL01 High-Performance High-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM
SQL02 High-Performance Standard-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM
SQL03 Standard-Performance High-Security Instance SQL Server 2008 Enterprise Edition x64 SP1 Laptop VM

Note that everything that has a computer name is listed here, whether physical or virtual.

Storage looks like this:

Storage Name Instance Purpose Volume Path Size (GB) LUN ID Speed
Quorum MicroCluster Cluster Quorum Quorum Q: 2    
SQL01Anchor SQL01 Instance Anchor SQL01Anchor L: 2    
SQL02Anchor SQL02 Instance Anchor SQL02Anchor M: 2    
SQL01Data1 SQL01 SQL Data SQL01Data1 L:\MountPoints\SQL01Data1 2    
SQL02Data1 SQL02 SQL Data SQL02Data1 M:\MountPoints\SQL02Data1      

Starting at the left is the name used in the storage array.  It is important to rename resources at each level, whether it is Storage, LUN, Volume, or disk folder.  Otherwise, troubleshooting things gets complex and difficult.  You want to be able to glance at a resource at any level and see where it comes from and what it is connected to.

Networking is the same way:

 

System Network VLAN  IP Subnet Mask Gateway DNS1 DNS2
East Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1
East Heartbeat Cluster2   255.255.255.0      
West Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1
West Heartbeat Cluster2   255.255.255.0      
North Public Cluster1 10.97.230.x(DHCP) 255.255.255.0 10.97.230.1 10.97.230.1 10.97.230.1
North Heartbeat Cluster2   255.255.255.0      
SQL01 Public Cluster1 10.97.230.x(DHCP) 255.255.255.0      
SQL02 Public Cluster1 10.97.230.x(DHCP) 255.255.255.0      

One hallmark of a poorly planned and implemented cluster is a bunch of "Local Network Connection #n" entries in the network settings page.  That lets me know that somebody didn't care about the long-term supportabaility of the cluster.  This can be critically important with Hyper-V Clusters and their high NIC counts.

 Final page:

 

Instance Service Name Account Password Domain OU
SQL01 SQL Server SVCSQL01 Baseline22 MicroAD Service Accounts
SQL01 SQL Agent SVCSQL01 Baseline22 MicroAD Service Accounts
SQL02 SQL Server SVC_SQL02 Baseline22 MicroAD Service Accounts
SQL02 SQL Agent SVC_SQL02 Baseline22 MicroAD Service Accounts
SQL03 (Future) SQL Server SVC_SQL03 Baseline22 MicroAD Service Accounts
SQL03 (Future) SQL Agent SVC_SQL03 Baseline22 MicroAD Service Accounts
           
Installation Account          
administrator          

 Yes.  I write down the account information.  I secure the file via NTFS, but I don't want to fumble around looking for passwords when it comes time to rebuild a node.

Always fill out the workbook COMPLETELY before installing anything.  The whole point is to have everything you need at your fingertips before you begin.  The install experience is so much better and more productive with this information in place.

 

 

 

Fixing Robocopy for SQL Server Jobs
Wed, 17 Feb 2010 20:05:29 GMT -

Robocopy is one of, if not the, best life-saving/greatest-thing-since-sliced-bread command line utilities ever to come from Microsoft.  If you're not using it already, what are you waiting for?

Of course, being a Microsoft product, it's not exactly perfect. ;)  Specifically, it sets the ERRORLEVEL to a non-zero value even if the copy is successful.  This causes a problem in SQL Server job steps, since non-zero ERRORLEVELs report as failed.

You can work around this by having your SQL job go to the next step on failure, but then you can't determine if there was a genuine error.  Plus you still see annoying red X's in your job history. 

One way I've found to avoid this is to use a batch file that runs Robocopy, and I add some commands after it (in red):

robocopy d:\backups \\BackupServer\BackupFolder *.bak
rem suppress successful robocopy exit statuses, only report genuine errors (bitmask 16 and 8 settings)
set/A errlev="%ERRORLEVEL% & 24"
rem exit batch file with errorlevel so SQL job can succeed or fail appropriately
exit/B %errlev%

(The REM statements are simply comments and don't need to be included in the batch file)

The SET command lets you use expressions when you use the /A switch.  So I set an environment variable "errlev" to a bitwise AND with the ERRORLEVEL value.

Robocopy's exit codes use a bitmap/bitmask to specify its exit status.  The bits for 1, 2, and 4 do not indicate any kind of failure, but 8 and 16 do.  So by adding 16 + 8 to get 24, and doing a bitwise AND, I suppress any of the other bits that might be set, and allow either or both of the error bits to pass.

The next step is to use the EXIT command with the /B switch to set a new ERRORLEVEL value, using the "errlev" variable.  This will now return zero (unless Robocopy had real errors) and allow your SQL job step to report success.

This technique should also work for other command-line utilities.  The only issues I've found is that it requires the commands to be part of a batch file, so if you use Robocopy directly in your SQL job step you'd need to place it in a batch.  If you also have multiple Robocopy calls, you'll need to place the SET/A command ONLY after the last one.  You'd therefore lose any errors from previous calls, unless you use multiple "errlev" variables and AND them together. (I'll leave this as an exercise for the reader)

The SET/A syntax also permits other kinds of expressions to be calculated.  You can get a full list by running "SET /?" on a command prompt.

WiX 3 Tutorial: Understanding main WXS and WXI file
Wed, 17 Feb 2010 12:27:32 GMT -

In the previous post we’ve taken a look at the WiX solution/project structure and project properties. We’re still playing with our super SuperForm application and today we’ll take a look at the general parts of the main wxs file, SuperForm.wxs, and the wxi include file. For wxs file we’ll just go over the general description of what each part does in the code comments. The more detailed descriptions will be in future posts about features themselves.

WXI include file

Include files are exactly what their name implies. To include a wxi file into the wxs file you have to put the wxi at the beginning of each .wxs file you wish to include it in. If you’ve ever worked with C++ you can think of the include files as .h files. For example if you include SuperFormVariables.wxi into the SuperForm.wxs, the variables in the wxi won’t be seen in FilesFragment.wxs or RegistryFragment.wxs. You’d have to include it manually into those two wxs files too.

For preprocessor variable $(var.VariableName) to be seen by every file in the project you have to include them in the WiX project properties->Build->“Define preprocessor variables” textbox.

This is why I’ve chosen not to go this route because in multi developer teams not everyone has the same directory structure and having a single variable would mean each developer would have to checkout the wixproj file to edit the variable. This is pretty much unacceptable by my standards. This is why we’ve added a System Environment variable named SuperFormFilesDir as is shown in the previous Wix Tutorial post. Because the FilesFragment.wxs is autogenerated on every project build we don’t want to change it manually each time by adding the include wxi at the beginning of the file. This way we couldn’t recreate it in each pre-build event.

<?xml version="1.0" encoding="utf-8"?>
<Include>
<!--
Versioning. These have to be changed for upgrades.
It's not enough to just include newer files.
-->
<?define MajorVersion="1" ?>
<?define MinorVersion="0" ?>
<?define BuildVersion="0" ?>
<!-- Revision is NOT used by WiX in the upgrade procedure -->
<?define Revision="0" ?>
<!-- Full version number to display -->
<?define VersionNumber="$(var.MajorVersion).$(var.MinorVersion).$(var.BuildVersion).$(var.Revision)" ?>
<!--
Upgrade code HAS to be the same for all updates.
Once you've chosen it don't change it.
-->
<?define UpgradeCode="YOUR-GUID-HERE" ?>
<!--
Path to the resources directory. resources don't really need to be included
in the project structure but I like to include them for for clarity
-->
<?define ResourcesDir="$(var.ProjectDir)\Resources" ?>
<!--
The name of your application exe file. This will be used to kill the process when updating
and creating the desktop shortcut
-->
<?define ExeProcessName="SuperForm.MainApp.exe" ?>
</Include>

For now there’s no way to tell WiX in Visual Studio to have a wxi include file available to the whole project, so you have to include it in each file separately.

Only variables set in “Define preprocessor variables” or System Environment variables are accessible to the whole project for now.

The main WXS file: SuperForm.wxs

We’ll only take a look at the general structure of the main SuperForm.wxs and not its the details. We’ll cover the details in future posts. The code comments should provide plenty info about what each part does in general.

Basically there are 5 major parts. The update part, the conditions and actions part, the UI install sequence, the directory structure and the features we want to include.

<?xml version="1.0" encoding="UTF-8"?>
<!-- Add xmlns:util namespace definition to be able to use stuff from WixUtilExtension dll-->
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi" xmlns:util="http://schemas.microsoft.com/wix/UtilExtension">
<!-- This is how we include wxi files -->
<?include $(sys.CURRENTDIR)Includes\SuperFormVariables.wxi ?>
<!--
Id="*" is to enable upgrading. * means that the product ID will be autogenerated on each build.
Name is made of localized product name and version number.
-->
<Product Id="*" Name="!(loc.ProductName) $(var.VersionNumber)" Language="!(loc.LANG)" Version="$(var.VersionNumber)" Manufacturer="!(loc.ManufacturerName)" UpgradeCode="$(var.UpgradeCode)">
<!-- Define the minimum supported installer version (3.0) and that the install should be done for the whole machine not just the current user -->
<Package InstallerVersion="300" Compressed="yes" InstallScope="perMachine"/>
<Media Id="1" Cabinet="media1.cab" EmbedCab="yes" />
<!-- Upgrade settings. This will be explained in more detail in a future post -->
<Upgrade Id="$(var.UpgradeCode)">
<UpgradeVersion OnlyDetect="yes" Minimum="$(var.VersionNumber)" IncludeMinimum="no" Property="NEWER_VERSION_FOUND" />
<UpgradeVersion Minimum="0.0.0.0" IncludeMinimum="yes" Maximum="$(var.VersionNumber)" IncludeMaximum="no" Property="OLDER_VERSION_FOUND" />
</Upgrade>
<!-- Reference the global NETFRAMEWORK35 property to check if it exists -->
<PropertyRef Id="NETFRAMEWORK35"/>
<!--
Startup conditions that checks if .Net Framework 3.5 is installed or if
we're running the OS higher than Windows XP SP2.
If not the installation is aborted.
By doing the (Installed OR ...) property means that this condition will only
be evaluated if the app is being installed and not on uninstall or changing
-->
<Condition Message="!(loc.DotNetFrameworkNeeded)">
<![CDATA[Installed OR NETFRAMEWORK35]]>
</Condition>
<Condition Message="!(loc.AppNotSupported)">
<![CDATA[Installed OR ((VersionNT >= 501 AND ServicePackLevel >= 2) OR (VersionNT >= 502))]]>
</Condition>
<!--
This custom action in the InstallExecuteSequence is needed to
stop silent install (passing /qb to msiexec) from going around it.
-->
<CustomAction Id="NewerVersionFound" Error="!(loc.SuperFormNewerVersionInstalled)" />
<InstallExecuteSequence>
<!-- Check for newer versions with FindRelatedProducts and execute the custom action after it -->
<Custom Action="NewerVersionFound" After="FindRelatedProducts">
<![CDATA[NEWER_VERSION_FOUND]]>
</Custom>
<!-- Remove the previous versions of the product -->
<RemoveExistingProducts After="InstallInitialize"/>
<!-- WixCloseApplications is a built in custom action that uses util:CloseApplication below -->
<Custom Action="WixCloseApplications" Before="InstallInitialize" />
</InstallExecuteSequence>
<!-- This will ask the user to close the SuperForm app if it's running while upgrading -->
<util:CloseApplication Id="CloseSuperForm" CloseMessage="no" Description="!(loc.MustCloseSuperForm)"
ElevatedCloseMessage="no" RebootPrompt="no" Target="$(var.ExeProcessName)" />
<!-- Use the built in WixUI_InstallDir GUI -->
<UIRef Id="WixUI_InstallDir" />
<UI>
<!-- These dialog references are needed for CloseApplication above to work correctly -->
<DialogRef Id="FilesInUse" />
<DialogRef Id="MsiRMFilesInUse" />
<!-- Here we'll add the GUI logic for installation and updating in a future post-->
</UI>
<!-- Set the icon to show next to the program name in Add/Remove programs -->
<Icon Id="SuperFormIcon.ico" SourceFile="$(var.ResourcesDir)\Exclam.ico" />
<Property Id="ARPPRODUCTICON" Value="SuperFormIcon.ico" />
<!-- Installer UI custom pictures. File names are made up. Add path to your pics. –>
<!--
<WixVariable Id="WixUIDialogBmp" Value="MyAppLogo.jpg" />
<WixVariable Id="WixUIBannerBmp" Value="installBanner.jpg" />
-->
<!-- the default directory structure -->
<Directory Id="TARGETDIR" Name="SourceDir">
<Directory Id="ProgramFilesFolder">
<Directory Id="INSTALLLOCATION" Name="!(loc.ProductName)" />
</Directory>
</Directory>
<!--
Set the default install location to the value of
INSTALLLOCATION (usually c:\Program Files\YourProductName)
-->
<Property Id="WIXUI_INSTALLDIR" Value="INSTALLLOCATION" />
<!-- Set the components defined in our fragment files that will be used for our feature -->
<Feature Id="SuperFormFeature" Title="!(loc.ProductName)" Level="1">
<ComponentGroupRef Id="SuperFormFiles" />
<ComponentRef Id="cmpVersionInRegistry" />
<ComponentRef Id="cmpIsThisUpdateInRegistry" />

</Feature>
</Product>
</Wix>

For more info on what certain attributes mean you should look into the WiX Documentation.

 

WiX 3 tutorial by Mladen Prajdić navigation

Guessing Excel Data Types
Wed, 17 Feb 2010 06:32:28 GMT -

Note to Self

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel: TypeGuessRows = 0 means scan everything.

Note to Others

About 10 years ago I stumbled across this bit of information just when I needed it and it saved my project.  Then for some reason, a few years later when it would have been nice, but not critical, for some reason I could not find it again anywhere.  Well, now I have stumbled across it again, and to preserve my future self from nightmares and sudden baldness due to pulling my hair out, I have decided to blog it in the hopes that I can find it again this way.

Here’s the story…  When you query data from an Excel spreadsheet, such as with old-fashioned DTS packages in SQL 2000 (my first reference) or simply with an OLEDB Data Adapter from ASP.NET (recent task) and if you are using the Microsoft Jet 4.0 driver (newer ones may deal with this differently) then you can get funny results where the query reports back that a cell value is null even when you know it contains data.

What happens is that Excel doesn’t really have data types.  While you can format information in cells to appear like certain data types (e.g. Date, Time, Decimal, Text, etc.) that is not really defining the cell as being of a certain type like we think of when working with databases.  But, presumably, to make things more convenient for the user (programmer) when you issue a query against Excel, the query processor tries to guess what type of data is contained in each column and returns it in an appropriate manner.  This is all well and good IF your data is consistent in every row and matches what the processor guessed.  And, for efficiency’s sake, when the query processor is trying to figure out each column’s data type, it does so by analyzing only the first 8 rows of data (default setting).

Now here’s the problem, suppose that your spreadsheet contains information about clothing, and one of the columns is Size.  Now suppose that in the first 8 rows, all of your sizes look like 32, 34, 18, 10, and so on, using numbers, but then, somewhere after the 8th row, you have some rows with sizes like S, M, L, XL.  What happens is that by examining only the first 8 rows, the query processor inferred that the column contained numerical data, and then when it hits the non-numerical data in later rows, it comes back blank.  Major bummer, and a real pain to track down if you don’t know that Excel is doing this, because you study the spreadsheet and say, “the data is RIGHT THERE!  WHY doesn’t the query see it?!?!  And the hair-pulling begins.

So, what’s a developer to do?  One option is to go to the registry setting noted above and change the DWORD value of TypeGuessRows from the default of 8 to 0 (zero).  Setting this value to zero will force Jet to scan every row in the spreadsheet before making its determination as to what type of data the column contains.  And that means that in the example above, it would have treated the column as a string rather than as numeric, and presto! your query now returns all of the values that you know are in there.

Of course, there is a caveat… if you are querying large spreadsheets, making Jet scan every row can be quite a performance hit.  You could enter a different number (more than 8) that you believe is a better sampling of rows to make the guess, but you still have the possibility that every row scanned looks alike, but that later rows are different, and that you might get blanks when there really is data there.  That’s the type of gamble, I really don’t like to take with my data.

Anyone with a better approach, or with experience with more recent drivers that have a better way of handling data types, please chime in!

mysql_explain_log is part of the standard MySQL distribution. It can be used to feed general MySQL logs back into MySQL and use EXPLAIN on all statements to analyse which indexes have been used and which queries didn't use any index.

dBforums Forum for various database types. 

phpMyAdmin is a web based database administration tool specifically for managing MySQL databases

The Windows® Azure™ Platform is an internet-scale cloud computing and services platform hosted in Microsoft data centers. The Windows Azure Platform provides a range of functionality to build applications that span from consumer web to enterprise scenarios and includes a cloud operating system and a set of developer services. Fully interoperable through the support of industry standards and web protocols such as REST and SOAP, you can use the Azure services individually or together, either to build new applications or to extend existing ones. What is the Windows Azure Platform?   Windows Azure Platform Training Kit includes a set of technical content including hands-on labs, presentations, and demos that are designed to help you learn how to use the Windows Azure platform including: Windows SQL Azure and .NET Services. Windows Azure Platform Developer Center  More Microsoft® Windows 7, Windows 8, Vista. XP, etc.

Embarcadero Developer Network (EDN), the community site for developers where you can access, leverage and contribute valuable information and knowledge at any time. The knowledge, systems, and membership that are EDN exist to enhance the effectiveness of your day-to-day job performance, enrich the career of anyone involved in systems development and management, and extend the breadth and depth of our industry. Includes C++ Builder, Delphi, J Builder, InterBase, Rapid SQL, etc... Using RAD, (Rapid Application Development), C++ environment and framework designed for ultra-fast development of highly-maintainable Windows GUI applications

w3schools Free HTML, XHTML, CSS, JavaScript, XML, ASP, PHP, SQL tutorials with lots of working examples and source code.

Programming Applications (VBA). Using Applications.

Hosting Question you should ask

Back to top ® © ™ are owned by respective authors and websites. There may be a charge for some software.


Trouble Shooting SQL.  Some of theses links may help ( You are not alone).  I can not vouch for any of these ideas but they may help

The Full-Text Stuff That We Didn't Put In The Manual

MySQL Main Site Troubleshooting search

MySQL Operating System Error Codes

MSDN SQL Server Troubleshooting and Support

MySQL online help

Problems and Common Errors    Server Error Codes and Messages What to Do If MySQL Keeps Crashing

 

MySQL Bugs report Search for reported MySQL bugs and information on reporting bugs with MySQL SQL Tools Summary

Maatkit Tools for SQL. Makes MySQL easier and safer to manage. It provides simple, predictable ways to do things you cannot otherwise do. It would be nice if these features were included with MySQL, but they are not. That's why Maatkit is now shipping by default with many GNU/Linux distributions such as Debian and CentOS.  You can use Maatkit to prove replication is working correctly, fix corrupted data, automate repetitive tasks, speed up your servers, and much, much more. This is the older MySQL Toolkit. This toolkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking. A set of essential tools for MySQL users, developers and administrators. The project's goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as Awk and Sed. slaves for data consistency, with emphasis on quality and scriptability.

MySQL GUI Tools Downloads

Troubleshooting Problems with MySQL Programs (devshed)

Text Stopwords The default list of full-text stop words.

Reserved Words Certain words such as SELECT,DELETE, or BIGINT are reserved and require special treatment for use as identifiers such as table and column names. This may also be true for the names of built-in functions. Reserved words are permitted as identifiers if you quote them as described in Section 8.2, Schema Object Names

SQL Server Performance

MySQL Performance Blog

Sphinx (SQL Phrase Index), Free open-source SQL full-text search engine. Provides fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism. Syphix Free open-source SQL full-text search engine. As we know build in full text search is currently limited only to MyISAM search engine as well as has few other limits. Today Sphinx Search plugin for MySQL was released which now provides fast and easy to use full text search solution for all storage engines. This version also adds a lot of other new features, including boolean search and distributed searching.

MySQL crashes is FullText search on with some words:-

MySQL crashes is fulltext search on with some words;   Continued

Sometimes the database crash with some fulltext searches... And with only some words (combination of words), not all.

Full text searches causing crashes

MySQL constantly crashes while executing the fulltext search query in boolean mode. When the search key contains two words and the first one is shorter then minimal word length for fulltext index and possibly contains an escaped quote like: WHERE MATCH(post.title,pagetext) AGAINST ('+3\" +exhaust' IN BOOLEAN MODE)

Full Text Search bug

SQL quesry crashes DB. Looks fine to me?

Fine-Tuning MySQL Full-Text Search

Server system variables; #sysvar_ft_min_word_len

SQL Error 28 and Error code 30.  "MySQL's temporary directory" /tmp that lacks space available.  (Not the main drive). Errcode: 30. This may also be  t/mp file issue that may result in message like; execute failed: Can't create/write to file '/tmp/#sql_xyx.MYI' (Errcode: 30)  Similar type of error may be cause by too many connection. Error 28 in SQL may occur because of duplicate data in the database. SQL databases should be normalized.

More info about SQL Error 28 and how to avoid it

Cleaning up /tmp directory on busy cPanel web hosting servers. MySQL leaves and uses it's temporary files in /tmp, and if there is no space in there, queries will start failing. Uploads from PHP or Perl are placed in there till the upload process is over, they cannot be further placed there because there is no more space left. So far, we have failing MySQL & inability to upload complete PHP files, system administrator hell.

'key_buffer_size'issue

MySQL headaches: Disk Full, Errcode 28

MySQL error 28 and solution

Where MySQL Stores Temporary Files  

SQL Other problem solvers:-
MySQL Crash Recovery

Recover accidentally removed table files from a MySQL Server

Forums. Computing, webmaster, programming Forums

querysniffer is a MySQL query sniffer written with Net::Pcap. It sniffs the network with pcap, extracts queries from mysql packets, and prints them on standard output.

Back to top ® © ™ are owned by respective authors and websites. There may be a charge for some software.


SQL injection

What is SQL Injection? It is a way to inject SQL query/command as an input possibly via web pages. Many web pages take parameters from web user, and make SQL query to the database.  With SQL Injection, it is possible for us to send SQL quire that will carry out an undesired result. For example It could be likened to issuing a format *.* in DOS.

SQL Injection watch a video about it:- Web Application Security (SQL Injection)

SQL Injection in detail

SQL Injection Walkthrough This article will try to help beginners with grasping the problems facing them while trying to utilize SQL Injection techniques, to successfully utilize them, and to protect themselves from such attacks.

SQL Injection Attacks by Example

SQL Injection

SQL Injection

SQL Injection Cheat Sheet's

SQL Injection Prevention Cheat Sheet This article is focused on providing clear, simple, actionable guidance for preventing SQL Injection flaws in your applications. SQL Injection attacks are unfortunately very common, and this is due to two factors: a) the significant prevalence of SQL Injection vulnerabilities, and b) the attractiveness of the target (i.e., the database typically contains all the interesting/critical data for your application).

SQL Injection Attacks and Some Tips on How to Prevent Them. Discusses various aspects of SQL Injection attacks, what to look for in your code, and how to secure it against SQL Injection attacks.

SQL Injection wiki Everything About SQL Injection

SQL injection Basic Tutorial

Introducing Bucket: A Minimal Dependency Injection Container for PHP

How to Detect and Prevent a WordPress Spam Injection Attack. Spam Injection software hides spam keyword links in code that is usually encoded with a PHP function that effectively scrambles HTML, to be decoded once safely embedded on your server, database, etc. You won't see these files decoded, but the Google Bot and other bots will when crawling your site! Once the Bots access the code the spam injection software has done it's work, effectively stealing your search index to improve their own pagerank.  Also see Blogs

Back to top ® © ™ are owned by respective authors and websites. There may be a charge for some software.

Web Masters. Click Here Now to start making money. A Great opportunity to make some money. Receive 50% by offering your users Ton's of Keywords on A Great Portal websites. Our Affiliate Program Pays you 50% on Level 1 of Every Sale of our Text Link both searchable and static Text Link!

 Enter the Bargain to search for at Compare Bargains.
Search Help for Compare Bargains.

Home   Advertising Methods FREE TIPS

A Computer Portal. Freeware, Shareware. Download software. Computer languages and Programming code. Including  PERL Scripts and Java Scripts. Webmaster Tools. Internet Marketing, Website promotion. Hardware Help from BIOS to Windows and UNIX.

® © ™ are owned by respective authors and websites. There may be a charge for some software. Google™ is a trademark of Google Inc, These pages are not endorsed by Google or any other Company