1、
MySQL错误日志里出现:
140331 10:08:18 [ERROR] Error reading master configuration
140331 10:08:18 [ERROR] Failed to initialize the master info structure
140331 10:08:18 [Note] Event Scheduler: Loaded 0 events
 
从错误提示就可以看出和mster info相关,找到数据目录下的master.info文件,直接cat查看
# cat master.info 
 
18
luocs-mysql-
 
bin.000004
267
1.1.1.1
rep1
PASSWORD
3306
60
0
 
 
 
 
 
0
1800.000
 
0
 
可以看出格式不对,解决方法非常简单,reset slave即可,这样将会清空master.info文件,错误也将消失。
 
———————————————————————————————————————————————————————–
2、
MySQL 往一张大表添加字段时报如下错误:
ERROR 1799 (HY000) at line 1: Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again. 
 
解决方法:
我的数据库为MySQL 5.5版本,innodb_online_alter_log_max_size值为默认大小128M。
mysql> show variables like 'innodb_online_alter_log_max_size';
+——————————————+———————-+
| Variable_name                            | Value                |
+——————————————+———————-+
| innodb_online_alter_log_max_size         | 134217728            |
+——————————————+———————-+
1 rows in set (0.00 sec)
 
该参数为动态参数且全局的,可通过如下命令加大
mysql> set global innodb_online_alter_log_max_size=402653184;
Query OK, 0 rows affected (0.03 sec)
 
加到合适大小,我往120G大小表里添加字段设置该值4G,成功执行。
 
———————————————————————————————————————————————————————–
3、
MySQL日志:
140306 12:03:25  InnoDB: ERROR: the age of the last checkpoint is 9434024,
InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.
 
应该是Innodb引擎下日志大小设置过小导致的,某个事物产生大量日志,但innodb_log_file_size设置过小,可以加大解决。
 
解决方法:
直接贴网上找到的方法,如下
STEP 01) Change the following in /etc/my.cnf
[mysqld]
innodb_log_buffer_size          = 32M
innodb_buffer_pool_size         = 3G
innodb_log_file_size            = 768M
STEP 02) mysql -uroot -p -e"SET GLOBAL innodb_fast_shutdown = 0;"
STEP 03) service mysql stop
STEP 04) rm -f /var/lib/mysql/ib_logfile*
STEP 05) service mysql start
I added SET GLOBAL innodb_fast_shutdown = 0;. What does that do? It forces InnoDB to completely purge transactional changes from all of InnoDB moving parts, including the transactional logs (ib_logfile0, ib_logfile1). Thus, there is no need to backup the old ib_logfile0, ib_logfile1. If deleting them makes you nervous, then make Step 04
 
mv /var/lib/mysql/ib_logfile* ..
 
———————————————————————————————————————————————————————–
4、
使用pt-online-schema-change工具添加字段时,收到错误如下:
# pt-online-schema-change –alter="add column tag_common text default null" –user=root –password=xxxxxxxx D=MYDB,t=MYTB –execute
Cannot connect to D=lsedata_13Q1,h=10.13.7.47,p=…,u=root
No slaves found.  See –recursion-method if host BJL1-Y13-10-ops.gaoder.net has slaves.
Not checking slave lag because no slaves were found and –check-slave-lag was not specified.
 
# A software update is available:
#   * Percona Toolkit 2.2.6 has a possible security issue (CVE-2014-2029) upgrade is recommended. The current version for Percona::Toolkit is 2.2.7.
 
The table `MYDB`.`MYTB` has triggers.  This tool needs to create its own triggers, so the table cannot already have triggers.
 
这是MYTB表上之前就有触发器的原因,可以从pt-online-schema-change的工作机制了解到:
 
1) 如果存在外键,根据alter-foreign-keys-method参数值,检测外键相关的表,针对相应的设置进行处理;
2) 创建一个新的表,表结构修改后的数据表,用于从源数据表向新表中导入数据;
3) 创建触发器,在复制数据开始之后,将对源数据表继续进行数据修改的操作记录下来,以便在数据复制结束后执行这些操作,保证数据不会丢失;
4) 复制数据,从源数据表中复制数据到新表中;
5) 修改外键相关的子表,根据修改后的数据,修改外键关联的子表;
6) 更改源数据表为old表,把新表更改为源表名,并将old表删除;
7) 删除触发器;
 
pt-online-schema-change详细文档,请阅读:http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html
 
———————————————————————————————————————————————————————–
5、
我们在使用mysqldump的时候可能会遇到如下错误:
mysqldump: Got error: 1044: Access denied for user 'lseread'@'IP' to database 'lsedata_13q1' when doing LOCK TABLES
 
解决方法:
加上–skip-lock-tables即可,类似如下:
mysqldump -h1.1.1.1  -uuser   -ppassword   -P3306 mydb mytb –where "time <= cast('2014-04-03 16:00' as datetime)" –skip-lock-tables –default-character-set=utf8  > mytb.txt 
 
MySQL5.6开始提供新特性GTID模式,我们的研发人员在我提供的从库上dump的时候遇到如下警告:
Warning: Using a password on the command line interface can be insecure.
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass –set-gtid-purged=OFF. To make a complete dump, pass –all-databases –triggers –routines –events.
 
虽然有警告,但数据还是可以dump出来,郁闷的是,想把dump文件导入到其他版本数据库的时候将会遇到:
ERROR 1839 (HY000) at line 24: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_MODE = ON.
 
参考如下文:
gtid_executed:
WHEN used WITH global scope, this variable contains a representation OF the SET OF ALL transactions that are logged IN the BINARY log. 
WHEN used WITH SESSION scope, it contains a representation OF the SET OF transactions that are written TO the cache IN the CURRENT SESSION.
Issuing RESET MASTER causes the global VALUE (but NOT the SESSION VALUE) OF this variable TO be reset TO an empty string.
 
解决方法:
dump的时候加上参数–gtid-mode=OFF,类似如下:
mysqldump -h1.1.1.1  -uuser   -ppassword   -P3306 mydb mytb –where "time <= cast('2014-04-03 16:00' as datetime)" –skip-lock-tables –default-character-set=utf8 –gtid-mode=OFF > mytb.txt 
 
———————————————————————————————————————————————————————–
6、
给一张大表添加字段,过了一段时间系统HANG住,添加字段工作中断。系统重启之后,想重新添加字段,却遇到如下错误:
mysql> alter table mytb add column yyy text default null;
ERROR 1050 (42S01): Table 'mydb/#sql-ib54' already exists
 
查看MySQL err日志:
2014-04-04 09:10:12 10578 [Note] /opt/mysql5.6/bin/mysqld: ready for connections.
Version: '5.6.17-log'  socket: '/opt/mysql5.6/data/mysql.sock'  port: 3307  Source distribution
2014-04-04 09:10:24 10578 [ERROR] InnoDB: Failed to find tablespace for table '"mydb"."#sql-ib54"' in the cache. Attempting to load the tablespace with space id 54.
2014-04-04 09:10:24 52e55940  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
2014-04-04 09:10:24 10578 [ERROR] InnoDB: Could not find a valid tablespace file for 'mydb/#sql-ib54'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
 
查看数据目录,有#开头的一些文件,如下:
# ls
– 其他表信息忽略,之所以.ibd文件较多是因为操作表mytb为分区表
#sql-ib58.ibd  #sql-ib65.ibd #sql-1935_2.frm  #sql-ib59.ibd  #sql-ib66.ibd
#sql-1935_2.par  #sql-ib60.ibd  #sql-ib67.ibd #sql-ib54.ibd    #sql-ib61.ibd
#sql-ib55.ibd    #sql-ib62.ibd #sql-ib56.ibd    #sql-ib63.ibd #sql-ib57.ibd    #sql-ib64.ibd
 
这个问题我头一次碰到,没深入研究过其中细节,只是想着#开头为临时文件,觉得可以rm,不犹豫直接删掉。
然后重启数据库,结果报了大量ERROR:
2014-04-04 09:10:12 2b1b9b20dfe0  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
2014-04-04 09:10:12 10578 [ERROR] InnoDB: Could not find a valid tablespace file for 'mydb/#sql-ib54'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2014-04-04 09:10:12 10578 [ERROR] InnoDB: Tablespace open failed for '"mydb"."#sql-ib54"', ignored.
2014-04-04 09:10:12 2b1b9b20dfe0  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
2014-04-04 09:10:12 10578 [ERROR] InnoDB: Could not find a valid tablespace file for 'mydb/#sql-ib55'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2014-04-04 09:10:12 10578 [ERROR] InnoDB: Tablespace open failed for '"mydb"."#sql-ib55"', ignored.
……
 
遇到问题需要淡定,我尝试继续添加字段,但报错依然。
然后我阅读了下错误日志里给出的http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html文章,才了解到,在独立表空间模式下通过shell命令删除表结构和表空间文件的话就会遇到这种问题。
 
文中还给出了抢救方法,我找到了如下文段:
Problem with Temporary Table
 
If MySQL crashes in the middle of an ALTER TABLE operation, you may end up with an orphaned temporary table inside the InnoDB tablespace. Using the Table Monitor, you can see listed a table with a name that begins with #sql-. You can perform SQL statements on tables whose name contains the character “#” if you enclose the name within backticks. Thus, you can drop such an orphaned table like any other orphaned table using the method described earlier. To copy or rename a file in the Unix shell, you need to put the file name in double quotation marks if the file name contains “#”.
 
我就创建了一个新库,创建了同结构的表,然后再该表上进行添加字段操作,这时候新库数据目录下又产生#sql-开头的文件,我把这些文件全部拷贝到原库目录下。
接着全部改名,如下:
# mv \#sql-2ff9_1.frm sql-2ff9_1.frm
# mv \#sql-2ff9_1.par sql-2ff9_1.par
# mv \#sql-ib82.ibd sql-ib82.ibd 
# mv \#sql-ib83.ibd sql-ib83.ibd 
# mv \#sql-ib84.ibd sql-ib84.ibd 
# mv \#sql-ib85.ibd sql-ib85.ibd 
# mv \#sql-ib86.ibd sql-ib86.ibd 
# mv \#sql-ib87.ibd sql-ib87.ibd 
# mv \#sql-ib88.ibd sql-ib88.ibd 
# mv \#sql-ib89.ibd sql-ib89.ibd 
# mv \#sql-ib90.ibd sql-ib90.ibd 
# mv \#sql-ib91.ibd sql-ib91.ibd 
# mv \#sql-ib92.ibd sql-ib92.ibd 
# mv \#sql-ib93.ibd sql-ib93.ibd 
# mv \#sql-ib94.ibd sql-ib94.ibd 
# mv \#sql-ib95.ibd sql-ib95.ibd 
 
这样show tables能查看表信息:
 
mysql> show tables;
+———————+
| Tables_in_mydb   |
+———————+
| #mysql50#sql-2ff9_1 |
+———————+
1 rows in set (0.01 sec)
 
但悲剧的是,我尝试删除#mysql50#sql-2ff9_1这个表,却始终没能删掉。
时间耗得也不少,开发那边一直在询问进展。
我就想到别的方案:
该表rename,创建结构一样的新表,导数据到新表,再往新表里增加字段,rename的表删除。
 
实际证明此方案可行。就是数据量大,导数据过程时间长,当然增加字段时间也一样长。
 
那网上有些人说删除ibdata1和ib_logfile0、1,然后重启数据库就可以解决,这方法可行吗?
答案当然是NO,在独立表空间模式下,ibdata里保存数据字典以及UNDO信息,删除之后重启数据库将会生成全新的ibdata,也就是丢失了字典信息和UNDO了。
 
到时候会出现如下现象:
mysql> show tables;
+——————-+
| Tables_in_mydb |
+——————-+
| t1          |
+——————-+
1 rows in set (0.01 sec)
 
mysql> select count(*) from t1;
ERROR 1146 (42S02): Table 'mydb.t1' doesn't exist

———————————————————————————————————————————————————————–

 

7、
主从复制失败,查看slave日志如下错误:
140405  4:16:12 [ERROR] Slave I/O: error reconnecting to master 'rep1@10.13.34.199:3306' – retry-time: 60  retries: 86400, Error_code: 2003
140405  6:53:12 [Note] Slave: connected to master 'rep1@10.13.34.199:3306',replication resumed in log 'mysql-bin.000275' at position 192295247
140405  6:53:12 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
140405  6:53:12 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236
140405  6:53:12 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000275', position 192295247
140405  6:54:11 [Note] Error reading relay log event: slave SQL thread was killed
140405  6:54:11 [Note] Slave I/O thread: connected to master 'rep1@10.13.34.199:3306',replication started in log 'mysql-bin.000275' at position 192295247
140405  6:54:11 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
140405  6:54:11 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236
140405  6:54:11 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000275', position 192295247
140405  6:54:11 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000275' at position 192295247, relay log './SHUBEI-34-198-relay-bin.000153' position: 192295393
 
从指定的binlog里找到指定的position位置,我发现这已经到头了,之后就是切了新binlog
# mysqlbinlog mysql-bin.000275 | grep -A 10 192295247 
#140405  3:16:06 server id 1  end_log_pos 192295247     Xid = 468032712
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
 
这下问题好解决了,给从库指定新的binlog即可。
 
主库show master status;
mysql> show master status;
+——————+———-+————–+——————+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+——————+———-+————–+——————+
| mysql-bin.000276 |      107 |              | test             |
+——————+———-+————–+——————+
1 row in set (0.00 sec)
 
从库重新配置复制:
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
 
mysql> change master to master_host='10.13.34.199',master_port=3306,master_user='rep1',master_password='RepSlavE&2013', master_log_file='mysql-bin.000276',master_log_pos=107; 
Query OK, 0 rows affected (0.05 sec)
 
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
 
查看主从复制状态:
mysql> show slave status\G
—-省略—-
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

———————————————————————————————————————————————————————–
8、
解决ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository
修改了relay-log的输出格式之后,重启MySQL,同步失败
 
mysql> start slave;
ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository
 
MySQL Error日志:
2014-02-20 16:35:19 27094 [ERROR] Failed to open the relay log './luocs166-relay-bin.000007' (relay_log_pos 359).
2014-02-20 16:35:19 27094 [ERROR] Could not find target log file mentioned in relay log info in the index file '/opt/mysql/data/anav-relay-log.index' during relay log initialization.
 
解决方法:
我在查看change master to命令的时候找到如下文段
mysql> ? change master to
– 省略大部分
The next example shows an operation that is less frequently employed.
It is used when the slave has relay log files that you want it to
execute again for some reason. To do this, the master need not be
reachable. You need only use CHANGE MASTER TO and start the SQL thread
(START SLAVE SQL_THREAD):
 
CHANGE MASTER TO
  RELAY_LOG_FILE='slave-relay-bin.006',
  RELAY_LOG_POS=4025;
 
但我的数据库版本为5.6,使用了GTID模式,使用报错:
mysql> CHANGE MASTER TO
    -> RELAY_LOG_FILE='anav-relay-log.000001',
    -> RELAY_LOG_POS=120;
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.
 
因此如下解决:
mysql> reset slave;
Query OK, 0 rows affected (0.00 sec)
 
mysql> CHANGE MASTER TO
    -> MASTER_HOST='10.19.3.168',
    -> MASTER_USER='repl2',
    -> MASTER_PASSWORD='oracle',
    -> MASTER_AUTO_POSITION = 1;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
 
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
 
mysql> show slave status\G
…. 省略部分 ….
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes